ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1

Full Official Name: ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1
Submission date: Dec. 21, 2018, 2:43 p.m.

The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaume I (Spain), is a collection of XML metatextually tagged corpora containing speeches from three European chambers (the European Parliament, the British House of Commons, and the Spanish Congreso de los Diputados). It is a bilingual, bidirectional written corpus in English and Spanish described by Zanettin (2012). This first set (ECPC_EP-05) consists of (1) a "clean" version in XML of European Parliament's 2005 daily sessions; (2) a POS-tagged version of the 2005 daily sessions; and (3) a sentence-based aligned version of 2005 daily sessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words (excluding tagging) in English distributed over 60 utf-8 files and 3,993,867 tokens/words (excluding tagging) in Spanish distributed over 60 utf-8 files.

Creator(s)
Distributor(s)
Right Holder(s)