Resource: GLiCom Spanish Wordform list - Regular word-forms + verb-clitic combinations
|Reference||GLiCom Spanish Wordform list - Regular word-forms + verb-clitic combinations|
|Date of Submission||Nov. 2, 2015, 6:24 p.m.|
GLiCom Spanish Wordform List v.1 is a computational lexicon of inflected wordforms in Spanish. Each entry has the following information: (i) lemma, (ii) morphosyntactic tag, and (iii) word type. This lexicon can be used in any application for Text Analysis in Spanish, in particular those in need for a lemmatizer, POS tagger, or Named Entity recogniser.
The list of wordforms contains 1,152,242 entries, including (i) regular words (1,144,086), (ii) toponyms and anthroponyms (8,032), (iii) abbreviations and acronyms (775), and (iv) computational terms (124). Each entry consists of: form, lemma, morphosyntactic tag and the word type.
The list of verb-clitic combinations contains 4,283,637 entries, exhaustively covering all formal combinations (including infinitive, gerund and imperative). Note that some clitic combinations may be formally possible although semantically implausible. Each entry consists of: form, lemma of the verb and combination of morphosyntactic tags of the verb and the pronoun(s).