Benchmark Corpus to Support Entity Recognition in Job Descriptions

Full Official Name: Benchmark Corpus to Support Entity Recognition in Job Descriptions
Submission date: Jan. 12, 2022, 4:28 p.m.

A public, human-labelled dataset of salient entities in job descriptions. Salient entities include: Skill Qualification Experience Occupation Domain Full details regarding annotation schema can be found in schema/Combined_Annotation_Instructions.pdf. Labels were collected from Amazon Mechanical Turk. Workers were required to achieve >70% accuracy in a qualification task before contributing to this dataset. Data formatting follows 2003 CONLL NER dataset conventions. Individual Worker responses, Worker ID and associated accuracies on the qualification task have been retained. Original job description data can be found at Kaggle. Credit to user airiddha for the development of the original dataset.

Right Holder(s)