Resource: TC-STAR Spanish Baseline Male Speech Database
|Reference||TC-STAR Spanish Baseline Male Speech Database|
|Date of Submission||Jan. 24, 2014, 4:31 p.m.|
|Resource Type||Primary Text|
The TC-STAR Spanish Baseline Male Speech Database was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.
It contains the recordings of one male Spanish speaker recorded simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal in a noise-reduced room. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 9 DVDs. The database complies with the common specifications created in the TC-STAR project.
The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.
Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (“lohi” or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The TC-STAR Spanish Baseline Female Speech Database is also available via ELRA under reference ELRA-S0309.