ISLRN

Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

Full Official Name: Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

Submission date: Oct. 3, 2017, 3 p.m.

We manually create high-quality datasets in the digital humanities domain for the evaluation of language models, specifically word embedding models. The first step comprises the creation of datasets for two fantasy novel book series for two task types each, analogy and doesn't-match. The work has been submitted to LREC 2018. This is followed by the training of models on the two book series with various popular word embedding model types such as word2vec, GloVe, fastText, or LexVec.

Creator(s)