Resource: PAROLE Irish Distributable Corpus
|Reference||PAROLE Irish Distributable Corpus|
|Date of Submission||Jan. 24, 2014, 4:30 p.m.|
|Resource Type||Primary Text|
The PAROLE Irish Distributable Corpus consists of over 8 million words (a subset of the 15+ million words Irish Reference corpus).
The text is marked-up in accordance with the PAROLE encoding standard which incorporates the Corpus Encoding Standard (CES) and Text Encoding Initiative (TEI) Guidelines. All the files are in SGML format with a detailed header and the body of the text tagged to paragraph level. The header includes information such as title, author(s), number of words, ownership, publication details and also a standard coding for Medium, Topic and Genre categories.
A subset of the Distributable Corpus is morpho-syntactically tagged.
Included in this distribution is approximately 3,000 manually checked words.