Resource: SpeechDat Galician Database for the Fixed Telephone Network

Reference SpeechDat Galician Database for the Fixed Telephone Network
Date of Submission Jan. 24, 2014, 4:31 p.m.
Status accepted
ISLRN 897-779-541-361-2
Resource Type Primary Text
Media Type Audio
Source
Language Gallegan
Description

The SpeechDat Galician Database for the Fixed Telephone Network contains the recordings of 653 speakers (217 males, 436 females) of Galician recorded over the fixed telephone network. This database is partitioned into 3 CDs. The database complies with the common specifications created in the SpeechDat project.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following 44 items:

– 3 common application words
– 1 sequence of isolated digits
– 4 digit strings : prompt sheet number, telephone number, credit card number, PIN code
– 1 spontaneous phone number
– 1 spontaneous PIN code (8 digits)
– 3 dates : spontaneous, date (birth date), Prompted date (word style), relative and general date expr.
– 1 application word phrase
– 1 isolated digit
– 3 spelled word : spontaneous, spelled own forename, spelled directory city name, spelled real/artificial words
– 1 money amount
– 2 natural numbers
– 5 directory assistance: forename (spontaneous), city of origin (spontaneous), country name (most frequent city), most frequent company/agency name, forename & surname (out of 500), surname (out of 76), “forename surname” (spontaneous)
– 2 spontaneous yes/no questions
– 10 phonetically rich sentences
– 2 time phrases : time of day (spontaneous), time phrase
– 4 phonetically rich words

The following age distribution has been obtained: 12 speakers are under 16, 375 are between 16 and 30, 164 are between 31 and 45, 88 are between 46 and 60, and 9 speakers are over 60. (The age of 5 speakers was not defined).

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

Version 1.0
Distributor ELRA