A large scale Arabic speech corpus for Automatic Speech synthesis

Full Official Name: AlKhalil Speech Corpus
Submission date: Sept. 14, 2020, 11:25 a.m.

Alkhalil Speech Corpus is an Arabic single speaker speech database recorded by a professional male speaker. It was designed mainly for unit-selection speech synthesis purposes. Yet, other possible applications may include end-to-end speech synthesis and speech recognition. The speech sources are paragraphs and articles that were selected thoroughly to cover different domains including science, literature, academic books, technology, etc.. The corpus includes the following files: 1- 15 .wav files presented as one channel 24 kHz 16-bit. 2- 15 .TextGrid files containing phoneme, word, and lemma-level annotations aligned with their corresponding speech utterances. These files can be opened using Praat software. 3- Orthographic-transcript.txt which contains a fully diacritized and hand-checked orthographic transcription covering more than 80.000 Arabic words. 4- buckwalter_transcript.txt which is a representation of the orthographic transcript file (3) in Buckwalter Format. 5- Pronunciation_transcript.txt which is a phonetic representation of the audio files describing the way the words were uttered by the speaker. This file is particularly useful for unit-selection based synthesis.

Creator(s)
Distributor(s)
Right Holder(s)