Resource: Italian lexicon with morphological information and clitic verbs

Reference Italian lexicon with morphological information and clitic verbs
Date of Submission Jan. 24, 2014, 4:29 p.m.
Status accepted
ISLRN 565-957-248-233-5
Resource Type Lexicon
Media Type Text
Source
Language Italian
Size 79 gb
Description

This Italian lexicon is the same as the one described in ELRA-L0069, but with the addition of clitic verbs, which increases the number of inflected forms to 1,800,000 (still corresponding to 112,000 simple words lemmas). Half the lexicon is made up of clitic verbs. It contains:
- 66,340 nouns, with type, gender, number and inflected forms (including irregular forms)
- 12,030 verbs, with mood, tense, person, gender, number, indication of clitic verbs and inflected forms (including irregular forms),
- 28,080 adjectives, with degree, gender, number and inflected forms (including irregular forms),
- 4,890 adverbs, with degree,
- 660 pronouns, articles, prepositions/postpositions and conjunctions.

Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. The inflected forms were generated using two databases: one containing the lemmas with the related root(s) and paradigm number(s), the other one containing the paradigm numbers with the related terminations and morphological information.

Each row in the resource file consists of four fields following the structure below:
Lemma|part of speech|inflected form|morphological information

The part of speech and the morphological information are encoded using our internal standard (an abbreviation key file is also provided).

Version 1.0
Distributor ELRA