BREF-120 - A large corpus of French read speech

Full Official Name: BREF-120 - A large corpus of French read speech
Submission date: Jan. 24, 2014, 4:22 p.m.

BREF-120 resulted from the efforts of LIMSI-CNRS researchers under sponsorship from the GDR-PRC CHM, the ACCT (OFIL), the EEC (ESPRIT Polyglot project), and the Aupelf-Uref. A sub-set of BREF-120 is BREF-80 (ELRA-S0006), which consists of about 50-60 sentences per speaker and recordings conducted only with a Shure microphone. In BREF-80, the sentences were chosen to cover as many prompts as possible. The BREF-120 corpus was designed to provide read speech data for the development and evaluation of continuous speech recognition systems (both speaker-dependent and speaker-independent), and to provide a large corpus of continuous speech for the acquisition of acoustic-phonetic knowledge of spoken French. BREF-120 is a large read-speech corpus containing over 100 hours of speech material, from 120 speakers (55 males and 65 females). The text materials were selected verbatim from extracts of the French newspaper "Le Monde". Each of 80 speakers read approximately 10,000 words (about 650 sentences) of text, and another 40 speakers each read about half that amount. Simultaneous recordings were made in a sound-proof room using a Shure SM10 microphone and a Crown PCC160 microphone and were monitored to assure their contents. The speech signal was sampled at 16 kHz and digitised with 16 bits. The BREF-120 corpus contains 28 CDs; numbers 1-13 contain the Shure recorded data and numbers 14-28 contain the Crown recorded data

Creator(s)
Distributor(s)
Right Holder(s)