ISLRN

NEM.fr

Full Official Name: Named-Entities Multigenre French Corpus

Submission date: Feb. 20, 2026, 11:21 a.m.

The corpus gathers 51,480 tokens distributed across 12 genres (from poetry to legal decisions, biomedical texts, spoken and more). This diversity makes the corpus suitable not only for evaluating the performance of Named Entity Recognition (NER) tools in heterogeneous contexts, but also for exploring textual variation, genre effects or register differences. Each sample contains approximately 1,000 tokens, except for some speeches and emails, where the natural segmentation is preserved.

Creator(s)

Alice Millour

Marina Seghier

Distributor(s)

Right Holder(s)

Status : Accepted

ISLRN :

857-654-609-197-8

Version

1.0

Source

https://github.com/ayusekyo111/NEM.fr

Resource Type

Annotated Corpus

Media Type

Text

Language(s)

French

Access Medium

Web Download