Resource: TGermaCorp

Reference TGermaCorp
Date of Submission March 9, 2017, 6:02 p.m.
Status accepted
ISLRN 536-382-801-278-5
Resource Type Primary Text
Media Type Text
Source
Language German
Format/MIME Type CoNLL
Size 244 documents, 8.941 sentences, 157.210 tokens
Access Medium ZIP
Description

TGermaCorp is a digital humanities resource built around German literature texts from the sixteenth century to the present. The primary texts are annotated on four levels: Firstly, the parts of speech are tagged according to STTS. Secondly, each token is assigned to its lemma. Thirdly, proper names are classified according to the kind of their referent (e.g., person or institution). Fourthly, clauses, sentences and paragraphs and headings are explicitly marked. One specific characteristic of TGermaCorp is the composition of its primary sources: TGermaCorp is designed in view of capturing the lexical and morpho-syntactic varieties of written German as exhibited in German-speaking literature. Hence, TGermaCorp aims at applications and investigations within the field of Digital Humanities and therefore is located in the low-resource intersection area between computational linguistics and the study of literature.

Version v0.2
Creator Andy Lücking , Text Technology Lab, Goethe University Frankfurt
Distributor Andy Lücking , Text Technology Lab, Goethe University Frankfurt
Rights Holder Andy Lücking , Text Technology Lab, Goethe University Frankfurt