ISLRN

TC-STAR English-Spanish Training Corpora for Machine Translation: Aligned Final Text Editions of EPPS

Full Official Name: TC-STAR English-Spanish Training Corpora for Machine Translation: Aligned Final Text Editions of EPPS

Submission date: Jan. 24, 2014, 4:31 p.m.

TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS). This corpus consists of respectively 34 million (English) and 38 million (Spanish) running words of bilingual sentence segmented and aligned texts in English and Spanish obtained from the Final Text Editions provided by the European Parliament (http://www.europarl.europa.eu) from April 1996 to Sept. 2004, Dec. 2004 to May 2005, and Dec. 2005 to May 2006. The data is accompanied by tools for further preprocessing.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

219-619-756-916-1

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1033

Resource Type

Lexicon

Media Type

Audio

Text

Language(s)

English

Spanish

Access Medium