Persian Kids’ Speech Corpus

Full Official Name: Persian Kids’ Speech Corpus
Submission date: June 20, 2023, 3:04 p.m.

The Persian Kids’ Speech Corpus consists of speech signals recorded by 286 children (141 girls, 145 boys), from 6 to 9 years old, through an Andreas Mic Anti-Noise microphone and a Premium Speechmike headphone. The CoolEdit Pro2.1 software was utilized to record the speech at 16 kHz, single-channel, 16bit resolution, and save it in WAV files. The data was recorded in the school environment, so some audio files contain the real environment noises. This recorded data was manually checked and labeled. Segmentation and labeling were performed through Praat software. Finally, a corpus containing 162,395 samples with a duration of 33 hours and 44 minutes was created. The samples are distributed as follows: - 29,057 Words (478 minutes), - 17,429 SubWords (260 minutes), - 43,838 Syllables (485 minutes), - 70,078 Phonemes (765 minutes), - 1,993 Extra Vocabulary (36 minutes). The prepared speech corpus comprehensively contains all the 29 Persian phonemes, 118 syllables, 56 sub-words, and 711 words and is particularly applicable to speech recognition and linguistics studies.

Right Holder(s)