Resource: English-Vietnamese Parallel Corpus

Reference English-Vietnamese Parallel Corpus
Date of Submission Jan. 17, 2018, 12:39 p.m.
Status accepted
ISLRN 838-483-738-912-8
Resource Type Primary Text
Media Type Text
Language English, Vietnamese
Format/MIME Type Plain text
Access Medium Downloadable

This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online news, collected between 2000 and 2007.
All Vietnamese sentences have been word-segmented and morphologically analyzed. The texts are provided in TEI format.

Version 1.0
Distributor ELRA