ISLRN

Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (50 people)

Full Official Name: Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (50 people)

Submission date: Jan. 24, 2014, 4:30 p.m.

This corpus comprises spontaneous speech (elicited) from 50 speakers of different dialects, ages and various educational levels (21 males and 29 females), who uttered 36 different topics in a working environment, recorded through head-mounted noise-cancelling microphone. The database comprises 600 speech files. Speech samples are stored as a sequence of 16-bit 44.1kHz WAV for a total of 8 hours of speech. The total capacity of the data is 2.37 Gb. Text files are stored in Unicode format. All data have been proofread manually. The transcriptions include non-speech markers (background noise, background speech, speaker sounds) as well as markers for mispronunciation, channel distortions, words left-out and duplicates. The corpus aims to be applied to the testing and telephone natural speech recognition system.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

148-164-879-681-0

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=904

Resource Type

Primary Text

Media Type

Audio

Language(s)

Chinese

Access Medium