Resource: NUM 5M Mongolian written corpus
|Reference||NUM 5M Mongolian written corpus|
|Date of Submission||July 12, 2017, 11:06 a.m.|
|Resource Type||Primary Text|
|Format/MIME Type||Plain text|
This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.
The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises:
Part of this corpus, about 2,800 sentences with 100,000 words, has been POS-tagged manually and stored in XML TEI format.