Resource: ESTER 2 Corpus

Reference ESTER 2 Corpus
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 123-207-221-143-8
Resource Type Primary Text
Media Type Audio
Language French

ESTER 2 evaluation campaign (Evaluation of Broadcast News enriched transcription systems) is based, one the one hand, on the full corpus from the first ESTER campaign (see ELRA-E0021 and ELRA-S0241), and which was, on the other hand, completed with a training corpus of about hundred hours, specific to ESTER 2, as well as quick transcriptions of African radios. A subset of the corpus consisting of 6 hours is identified as the development corpus. This new data constitute the ESTER 2 Corpus.

ESTER 2 Corpus consists of:
- a manually transcribed radio broadcast news corpus amounting about 100 hours,
- quick transcriptions of African radios amounting about 6 hours.

An annotation of named entities is provided within the development data (about 6 hours).

The recorded radios contain news broadcast, files linked to current news and more conversational-oriented broadcast.

Version 1.0
Distributor ELRA