Resource: ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1
|Reference||ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1|
|Date of Submission||Dec. 21, 2018, 2:43 p.m.|
|Resource Type||Primary Text|
The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaume I (Spain), is a collection of XML metatextually tagged corpora containing speeches from three European chambers (the European Parliament, the British House of Commons, and the Spanish Congreso de los Diputados). It is a bilingual, bidirectional written corpus in English and Spanish described by Zanettin (2012). This first set (ECPC_EP-05) consists of (1) a "clean" version in XML of European Parliament's 2005 daily sessions; (2) a POS-tagged version of the 2005 daily sessions; and (3) a sentence-based aligned version of 2005 daily sessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words (excluding tagging) in English distributed over 60 utf-8 files and 3,993,867 tokens/words (excluding tagging) in Spanish distributed over 60 utf-8 files.