Resource: Reference Corpus of Early New High German (1350–1650)

Reference Reference Corpus of Early New High German (1350–1650)
Date of Submission Nov. 17, 2021, noon
Status accepted
ISLRN 918-968-828-554-7
Resource Type Annotated corpus
Media Type XML
Source
Language German, Middle High ca.1050-1500
Format/MIME Type text/xml
Size 1.1 GB (approx. 3.8m tokens)
Access Medium Download
Description

The Reference Corpus of Early New High German (1350–1650) consists of approx. 3.8 million tokens, providing a careful selection of Early New High German texts from 1350 to 1650. The corpus was compiled in the context of a series of projects at the Universities of Bonn, Bochum, Halle and Potsdam, beginning in the 1970s.

The corpus is composed of three sub-corpora: ReF.RUB, ReF.MLU, ReF.UP. For detailed documentation, see https://www.linguistics.ruhr-uni-bochum.de/ref.
ReF.RUB and ReF.MLU use transcriptions that comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with part-of-speech tags (using the HiTS tagset), morphology and lemmas; parts of the texts have been annotated manually, the rest automatically.
ReF.UP has been manually annotated with POS tags as well as with syntax structures according to the TIGER scheme. Three texts are additionally annotated with inflectional morphology.

ReF.RUB and ReF.MLU can be downloaded in Cora-XML format (see https://www.linguistics.ruhr-uni-bochum.de/comphist/resources/cora); ReF.UP is provided in TIGER-XML format (see https://www.ims.uni-stuttgart.de/documents/ressourcen/werkzeuge/tigersearch/doc/html/TigerXML.html), all under the Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

Version 1.0
Creator Stefanie Dipper - Ruhr-Universität Bochum , Klaus-Peter Wegera - Ruhr-Universität Bochum , Hans-Joachim Solms - Martin-Luther-Universität Halle-Wittenberg , Ulrike Demske - Universität Potsdam
Distributor Stefanie Dipper - Ruhr-Universität Bochum
Rights Holder Stefanie Dipper - Ruhr-Universität Bochum , Klaus-Peter Wegera - Ruhr-Universität Bochum , Hans-Joachim Solms - Martin-Luther-Universität Halle-Wittenberg , Ulrike Demske - Universität Potsdam