The Portuguese SpeechDat(II) FDB-4000 comprises 4027 Portuguese speakers (1861 males, 2166 females) recorded over the Portuguese fixed telephone network. This database is partitioned into 11 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
- 1 isolated single digit
- 1 sequence of 10 isolated digits
- 4 numbers : 1 sheet number (5+ digits), 1 telephone number (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits)
- 1 currency money amount
- 1 natural number
- 3 dates : 1 spontaneous (date or year of birth), 1 prompted date, 1 relative or general date expression
- 2 time phrases : 1 time of day (spontaneous), 1 time phrase (word style)
- 3 spelled words : 1 spontaneous (own forename), 1 city name, 1 real word for coverage
- 5 directory assistance utterances : 1 spontaneous, own forename, 1 city of birth / growing up (spontaneous), 1 frequent city name, 1 frequent company name, 1 common forename and surname
- 2 yes/no questions : 1 predominantly ?yes? question, 1 predominantly ?no? question
- 3 application words
- 1 keyword phrase using an embedded application word
- 4 phonetically rich words
- 9 phonetically rich sentences
The following age distribution has been obtained: 241 speakers are below 16 years old, 1404 speakers are between 16 and 30, 1532 speakers are between 31 and 45, 711 speakers are between 46 and 60, and 139 speakers are over 60.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included