Resource: PAROLE Italian Corpus

Reference PAROLE Italian Corpus
Date of Submission Jan. 24, 2014, 4:30 p.m.
Status accepted
ISLRN 608-362-291-385-1
Resource Type Primary Text
Media Type Text
Source
Language Italian
Format/MIME Type Plain text
Description

The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains:
• newspapers: 2,179,800 words from La Stampa, La Repubblica, Il Corriere della Sera, L’Unione Sarda, Il Sole 24ore, between 1992 and 1996,
• periodicals: 143,810 words from Casaviva, 100cose, Epoca, Espansione, Grazia, Panorama, Starbene, Storia Illustrata, Zerouno, between 1985 and 1988,
• books: 564,964 words, between 1970 and 1989,
• miscellaneous: 247,077 words from CNR documents, Patents, Maritime documents, Theater, between 1987 and 1997.

About 250,000 words were morphosyntactically annotated and lemmatized.

Version 1.0
Distributor ELRA