Resource: Anselm Corpus

Reference Anselm Corpus
Date of Submission Dec. 21, 2018, 2:40 p.m.
Status accepted
ISLRN 568-178-806-856-4
Resource Type Annotated corpus
Media Type XML
Source
Language German, Middle High ca.1050-1500
Format/MIME Type text/xml
Size 135 MB (approx. 400,000 tokens)
Access Medium Download
Description

"Interrogatio Sancti Anselmi de Passione Domini" ('Questions by Saint Anselm about the Lord’s Passion') is a medieval religious treatise which is documented in an exceptionally broad number of written records. In total, there are around 70 German manuscripts and prints written up between the 14th and 16th centuries. The Anselm Corpus consists of 58 texts with 400,000 tokens in total and is a highly interesting resource for comparative investigations in different areas such as linguistics, history or theology. The transcriptions of the texts comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with a normalized and a modernized wordform, part-of-speech tags (using a slightly modified version of the STTS tagset), morphology, and lemma (see https://www.linguistics.ruhr-uni-bochum.de/anselm/). The corpus can be downloaded in Cora-XML format (see https://www.linguistics.ruhr-uni-bochum.de/comphist/resources/cora) under the Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0): https://www.linguistics.ruhr-uni-bochum.de/anselm/access/index.en.html.

Version 1.0
Creator Stefanie Dipper - Ruhr-Universität Bochum , Simone Schultz-Balluff - Universität Bonn
Distributor Stefanie Dipper - Ruhr-Universität Bochum
Rights Holder Stefanie Dipper - Ruhr-Universität Bochum , Simone Schultz-Balluff - Universität Bonn