Resource: TRAD Pashto Monolingual text Corpus
|Reference||TRAD Pashto Monolingual text Corpus|
|Date of Submission||April 6, 2016, 4:51 p.m.|
|Resource Type||Primary Text|
This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites.
Identified and negotiated or freely available sources have been crawled in 2012, cleaned and XML-formatted.
This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA).