Resource: EnToSSLNE - a Lexicon of Parallel Named Entities from English to South Slavic Languages
|Reference||EnToSSLNE - a Lexicon of Parallel Named Entities from English to South Slavic Languages|
|Date of Submission||April 24, 2019, 5:18 p.m.|
|Language||Bosnian, Bulgarian, Croatian, English, Macedonian, Serbian, Slovenian|
This lexicon contains multiword entries which are not strictly named entities, but contain a word which is. For example, German shepherd is an entry in this lexicon, since many dogs of this breed exist. But, the adjective German makes it a named entity in a broader sense. Accordingly, there are many multiword units in the lexicon which contain ethnonyms. Similarly, the unit Planck's law belongs to this lexicon as well.
Certain natural terms like biological species and substances, which are sometimes considered named entities, are not included in the lexicon.
Slovenian, Croatian and Bosnian are written in Latin script, Macedonian and Bulgarian in Cyrillic. Serbian language is specific since it may come in two scripts (Cyrillic and Latin) and two dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica variant and its Cyrillic script.
The lexicon consists of 26,155 entries. A tag is assigned to each one of them. The distribution of classes is as follows:
In the xml file, the tag denoting the class is an attribute and languages are elements.