Full Official Name: Bulgarian Treebank Corpus
Submission date: Oct. 3, 2022, 12:39 p.m.

The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in Universal Dependencies format. This subset of BulTreeBank excludes ellipses and some rare phenomena. The conversion of BulTreeBank into Universal Dependency format was supported by the EU Project QTLeap (http://qtleap.eu/).

