ISLRN

Persian 1984 corpus (Multext-East framework)

Full Official Name: Persian 1984 corpus (Multext-East framework)

Submission date: Jan. 24, 2014, 4:30 p.m.

This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext-East framework (Multilingual Text Tools and Corpora for Eastern and Central European Languages). The aim of the Multext-East project was to develop standardized language resources. The package comprises: (i) the specifications for morphosyntactic encoding of Persian Language, based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East, (ii) the annotated Persian version of Orwell’s 1984 corpus. The corpus contains extensive headers and markup for document structure, sentences, and various sub-sentence annotations in the XML-format following the TEI guidelines. Annotation includes POS (part-of-speech) and lemmas. The corpus contains approximately 100,000 words (6,604 sentences, 13,247 lemmas) and can easily be aligned with other corpora in the MULTEXT-East framework.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Status : Accepted

ISLRN :

851-240-629-673-1

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1124

Resource Type

Primary Text

Media Type

Text

Language(s)

Persian

Access Medium