Resource: ECI-ELSNET Italian & German tagged sub-corpus
|Reference||ECI-ELSNET Italian & German tagged sub-corpus|
|Date of Submission||Jan. 24, 2014, 4:22 p.m.|
|Resource Type||Primary Text|
The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the ECI corpus, the Italian material comes from the Italian corpus of ILC - CNR. For German the data concerns several domains including Economy (17,000 word forms), Politics (14,000 word forms), Culture (18,000 word forms), Sports (9,000 word forms), and Local Events (8500 word forms). The situation for Italian is comparable to that. Word occurrences are tagged with very fine grained tagsets which are based on the EAGLES morphosyntactic guidelines.