ISLRN

NE3L named entities Arabic corpus

Full Official Name: NE3L named entities Arabic corpus

Submission date: Oct. 7, 2014, 6:03 p.m.

OLAC identifier: oai:catalogue.elra.info:ELRA-W0078 Written Corpora The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5 named entity categories were taken into account: Person, Place, Organisation, Time and Amount. Each language was concerned only by a subset of these categories, i.e. Arabic was marked up with Time and Amount tags, as well as Russian, whereas Chinese was marked up with Person, Place and Organisation tags.The Arabic corpus contains 103,363 words coming from articles extracted from ?Le Monde Diplomatique? newspaper, and published in 2004. The Arabic corpus contains 103,363 words coming from articles extracted from ?Le Monde Diplomatique? newspaper, and published in 2004. 2 named entity categories were taken into account: Time and Amount.

Creator(s)

Distributor(s)

Right Holder(s)

Status : Accepted

ISLRN :

398-979-151-557-0

Version

1.0

Source

http://catalog.elra.info/product_info.php?products_id=1226

Resource Type

Primary Text

Media Type

Text

Language(s)

Arabic

Access Medium