ESTER 2 Corpus

Submission date: Jan. 24, 2014, 4:29 p.m.

ESTER 2 evaluation campaign (Evaluation of Broadcast News enriched transcription systems) is based, one the one hand, on the full corpus from the first ESTER campaign (see ELRA-E0021 and ELRA-S0241), and which was, on the other hand, completed with a training corpus of about hundred hours, specific to ESTER 2, as well as quick transcriptions of African radios. A subset of the corpus consisting of 6 hours is identified as the development corpus. This new data constitute the ESTER 2 Corpus. ESTER 2 Corpus consists of: - a manually transcribed radio broadcast news corpus amounting about 100 hours, - quick transcriptions of African radios amounting about 6 hours. An annotation of named entities is provided within the development data (about 6 hours). The recorded radios contain news broadcast, files linked to current news and more conversational-oriented broadcast.

