Resource: TC-STAR Spanish Baseline Male Speech Database

Reference TC-STAR Spanish Baseline Male Speech Database
Date of Submission Jan. 24, 2014, 4:31 p.m.
Status accepted
ISLRN 736-021-086-598-0
Resource Type Primary Text
Media Type Audio
Language Spanish, Castilian

The TC-STAR Spanish Baseline Male Speech Database was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.

It contains the recordings of one male Spanish speaker recorded simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal in a noise-reduced room. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 9 DVDs. The database complies with the common specifications created in the TC-STAR project.

The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.

Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (“lohi” or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

The TC-STAR Spanish Baseline Female Speech Database is also available via ELRA under reference ELRA-S0309.

Version 1.0
Distributor ELRA