Resource: ESTER Corpus

Reference ESTER Corpus
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 055-636-352-982-9
Resource Type Primary Text
Media Type Audio
Source
Language French
Description

The ESTER Corpus is a subset of the ESTER Evaluation Package (catalogue ref. ELRA-E0021), which was produced within the French national project ESTER (Evaluation of Broadcast News enriched transcription systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ESTER project enabled to carry out a campaign for the evaluation of Broadcast News enriched transcription systems using French data.

This corpus includes the material that was used for the ESTER evaluation campaign, excluding the textual data (available in this catalogue and referenced ELRA-W0015 and ELRA-W0023):

1) About 100 hours of orthographically transcribed news broadcast, including annotations of named entities.
2) The evaluation tools allow to evaluation each task defined above.
3) Two guides and manuals were produced and are provided in the package distributed by ELDA :
o Guide for the annotation of named entities
o Specifications and evaluation protocol

An extra corpus of 1,700 hours of non-transcribed radio broadcast news recordings can also be provided upon request, on hard disk, as an adding to this package at a cost of 100 Euro (plus shipment fee).

A description of the project is available at the following address:
http://www.technolangue.net/article.php3?id_article=60 (in French language)

Version 1.0
Distributor ELRA