Resource: Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

Reference Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings
Date of Submission Oct. 3, 2017, 3 p.m.
Status accepted
ISLRN 339-024-047-160-3
Resource Type evaluation datasets for language models
Media Type Text
Source
Language English
Format/MIME Type text/plain
Size ca 200MB
Description

We manually create high-quality datasets in the digital humanities domain for the evaluation of language models, specifically word embedding models. The first step comprises the creation of datasets for two fantasy novel book series for two task types each, analogy and doesn't-match. The work has been submitted to LREC 2018.
This is followed by the training of models on the two book series with various popular word embedding model types such as word2vec, GloVe, fastText, or LexVec.

Version 1.0
Creator Gerhard Wohlgenannt