The Danish SpeechDat(M) database is the speech database collected within the SpeechDat(M) project. It consists ofpolyphone-like data recorded by 1,523 speakers.
The speech files are stored as sequences of 8 bit 8 kHz A-law samples. Each prompted utterance is stored within a separatefile and the associated label files are stored in SAM file format.
An ASCII file is attached and is listing information about each speaker: speaker code, sex, age, region, prompt number.
The lexicon is presented in a TAB delimited ASCII file containing an alphabetically ordered list of distinct lexical itemsoccurring in the database. Each entry contains a frequency count and corresponding pronunciation information.
WORD FREQUENCY PHONEMIC TRANSCRIPTIONS
åbnede 104 O b n @ D | O b n @ D @
adresseangivelse 97 a d R a s @ a n g i: u l s @
The complete Danish SpeechDat database consists of 5 CD-ROMs. The first three CD-ROMs contain the application oriented sub-set. The last two CD-ROMs contain the phonetically rich sentences.
The included items are:
· 5 application word phrases (semi spontaneous)
· 12 connected digit strings with 8 digits
· 24 natural numbers (3-4 digits)
· 27 application words
· 3 dates, D3 spontaneous (birthday)
· 3 spelled words
· 2 money amounts, M1 small, M2 large
· City name (spontaneous)
· 3 yes/no questions (spontaneous)
· 22-25 sentences
· T1 time phrase, T2 time of day (spontaneous)
There are 1,523 speakers in the SpeechDat database from 11 linguistic regions of Denmark and five age groups (under 16, 16-30, 31-45, 46-60, over 60). 78% of them are between 16 and 60 years old.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included.