Resource: Catalan-Spanish Parallel Corpus

Reference Catalan-Spanish Parallel Corpus
Date of Submission Jan. 24, 2014, 4:22 p.m.
Status accepted
ISLRN 124-613-721-890-1
Resource Type Primary Text
Media Type Text
Source
Language Catalan, Valencian, Spanish, Castilian
Description

This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited.

The data are aligned at sentence level and stored in text files, in a one sentence per line basis. The data are provided in plain text, with no encoding whatsoever.

Version 1.0
Distributor ELRA