Resource: KORLEX – Serbian Lexicon
|Reference||KORLEX – Serbian Lexicon|
|Date of Submission||Jan. 24, 2014, 4:30 p.m.|
This lexical resource was developed as part of the bilingual lexicon for English-Serbian built for the following project: http://www.rjecnik.com.
The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news (e.g., newswire articles), in business, technological documentation, legal documentation, and politics. The words that are primarily used in literary and religious contexts, and which are not part of every-day usage, are generally not included in the lexicon.
The KORLEX-Serbian Lexicon provides a list of 108,491 Serbian lemmas, i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. Among these 108,491 entries, there are 52,027 nouns, 9,153 adverbs, 15,522 verbs and 31,052 adjectives. Remaining entries are pronouns, determiners, prepositions/postpositions, conjunctions and numerals.
The resource is a flat textual file in which each textual line contains information about one lemma. The format of a line can be captured with the following Perl regular expressions:
# Characters appearing in a word (ISO-8859-2)
The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Serbian lexicographic order.