Arbobanko (Esperanto Treebank)

The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, dependency links, NER types, semantic types of nouns and adjectives, and verb frame categories. Morphosyntactic and dependency annotation was performed with the EspGram parser, and manually revised. Semantic categories were added in a second round of annotation, and are also manually revised and disambiguated. The format is native Constraint Grammar sgml, with token-based tag lines, xml with feature-attribute pairs or CoNNL tab format.

