|Reference||STO SprogTeknologisk Ordbase (Danish Lexicon for NLP/HLT Applications)|
|Date of Submission||Jan. 24, 2014, 4:31 p.m.|
The STO Lexicon is the most comprehensive computational lexicon of Danish comprising approx. 81,530 entry words, and it is well integrated with the European activities in the field of lexicon development building on experience obtained from the PAROLE and SIMPLE projects. The model and descriptive method of the STO lexicon are kept compatible with the architecture and descriptive language of PAROLE/SIMPLE. A number of refinements, adaptations and language-specific extensions to the basic model are implemented in STO.
Lexical coverage and encoded information by category is distributed as follows:
Lexical Category Lemmas
A part of this vocabulary (i.e. 12,060 lemmas) is selected from 6 domains, as follows:
Domain Nouns Verbs Adjectives Total
Linguistic coverage / Main information types:
The resource was validated internally. This lexicon is well suited for NLP/HLT monolingual applications, as lexicon component in taggers, parsers, grammar & spell checkers, summarisation tools, web crawlers, computer-aided language learning, as well as multilingual applications; also possibility for linking to other PAROLE/SIMPLE-compatible resources.
The lexicon is provided with a thorough documentation in English and distributed on CD-ROM.