Resource: BREF-80

Reference BREF-80
Date of Submission Jan. 24, 2014, 4:22 p.m.
Status accepted
ISLRN 310-036-258-354-7
Resource Type Primary Text
Media Type Audio
Language French

The BREF corpus was designed to provide enough read speech data for the development and evaluation of continuous speech recognition systems (both speaker-dependent and speaker-independent), and to provide a large corpus of continuous speech for the acquisition of acoustic-phonetic knowledge of spoken French. All the recorded texts were selected from extracts of the French newspaper Le Monde so as to provide a large vocabulary (over 20,000 words) and a wide range of phonetic environments. The entire BREF corpus contains over 100 hours of speech material from 120 speakers.
The BREF-80 sub-corpus consists of 2 ISO9660 CDROMs, BREF80-1 and BREF80-2, containing speaker-independent training data from 80 speakers. Together these 2 CDs contain 5330 sentences, an average of 67 sentences per speaker. While this data represents only a small portion of the entire BREF corpus, the sentences have been selected to cover most of the BREF training prompts, in order to conserve a wide range of phonetic contexts with a minimum amount of speech data. Thus, the BREF80 sub-corpus produced on these CDs was especially selected to train speaker-independent, vocabulary-independent speech recognizers.

Version 1.0
Distributor ELRA