Resource: Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (50 people)

Reference Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (50 people)
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 148-164-879-681-0
Resource Type Primary Text
Media Type Audio
Source
Language Chinese
Description

This corpus comprises spontaneous speech (elicited) from 50 speakers of different dialects, ages and various educational levels (21 males and 29 females), who uttered 36 different topics in a working environment, recorded through head-mounted noise-cancelling microphone. The database comprises 600 speech files. Speech samples are stored as a sequence of 16-bit 44.1kHz WAV for a total of 8 hours of speech. The total capacity of the data is 2.37 Gb.
Text files are stored in Unicode format. All data have been proofread manually.
The transcriptions include non-speech markers (background noise, background speech, speaker sounds) as well as markers for mispronunciation, channel distortions, words left-out and duplicates.
The corpus aims to be applied to the testing and telephone natural speech recognition system.

Version 1.0
Distributor ELRA