Resource: "Le Monde Diplomatique" Arabic tagged corpus
|Reference||"Le Monde Diplomatique" Arabic tagged corpus|
|Date of Submission||Jan. 24, 2014, 4:30 p.m.|
|Resource Type||Primary Text|
This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04).
To each text are associated 3 files :
Each text word associates a certain number of information, such as word size, rank of the word in the text, paragraph number where the word was found, etc. Each word associates a node in the XML file. Each node contains the following positional features of the word in the text:
Information about word annotation are added as « sub-nodes »: