Resource: TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data

Reference TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data
Date of Submission April 6, 2016, 4:51 p.m.
Status accepted
ISLRN 006-102-605-738-4
Resource Type Primary Text
Media Type Text
Source
Language English, Pushto
Size 541 Kb
Description

This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011, 18/01/2011 and 19/01/2011.

The content has also been translated into French (see ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test set).

Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan.

This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as a test set for an internal MT evaluation campaign.

Version 1.0
Distributor ELRA