|Date of Submission||Nov. 3, 2016, 4:03 p.m.|
SecuVoice is a corpus of single-channel utterances in Spanish containing sequences of isolated digits from zero to nine. These utterances were acquired by using two different devices, i.e. a mid-range smartphone and a high-range one. For both models, the utterances were stored as uncompressed monophonic WAV files with a sampling frequency of 8000 Hz and 16 bits per sample.
This database is especially suitable for research on biometrics and secure applications that integrate both automatic speech recognition (ASR) and speaker recognition/verification.
SecuVoice contains a total of 7,098 utterances (169 speakers x 42 utt./speaker) with 34,476 digits (204 digits/speaker). Utterances are arranged into two different datasets: (i) the ENROLL dataset contains the 1,014 enrollment utterances (169 speakers x 6 enroll. utt./speaker) with 10,140 digits; (ii) the VERIF dataset contains the 6,084 verification utterances (169 speakers x 36 verif. utt./speaker) with 24,336 digits. Each digit from zero to nine is present 3,380 times, except digits three and five unbalanced in the VERIF dataset (2,704 utterances against 2,366 for the other digits) for a total number of 3,718 utterances each.
Along with the WAV files containing the speech utterances, XML annotation files containing detailed information about the speakers and the recorded sequences of digits are provided.