ISLRN

TRAD Pashto Monolingual text Corpus

Full Official Name: TRAD Pashto Monolingual text Corpus

Submission date: April 6, 2016, 4:51 p.m.

This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites. Identified and negotiated or freely available sources have been crawled in 2012, cleaned and XML-formatted. Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA).

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

394-903-293-388-0

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1266

Resource Type

Primary Text

Media Type

Text

Language(s)

Pushto

Access Medium