Resource: Quaero Broadcast News Extended Named Entity corpus
|Reference||Quaero Broadcast News Extended Named Entity corpus|
|Date of Submission||Jan. 24, 2014, 4:31 p.m.|
|Resource Type||Primary Text|
The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). The first part is the training corpus and the second one is the test corpus.
The corpus is fully manually annotated according to the Quaero extended and structured named entity definition, which differentiates entity "types" and "components". The training part of the corpus is only composed of broadcast news data and contains 188 shows, 1,291,225 words, 113,885 types and 146,405 components. The test corpus is composed of both broadcast news and broadcast conversations data and contains 18 shows, 108,010 words, 5,523 types and 8,902 components.
The Quaero Broadcast News Extended Named Entity Corpus consists of: