Resource: Benchmark Corpus to Support Entity Recognition in Job Descriptions
|Reference||Benchmark Corpus to Support Entity Recognition in Job Descriptions|
|Date of Submission||Jan. 12, 2022, 4:28 p.m.|
|Resource Type||Primary Text|
A public, human-labelled dataset of salient entities in job descriptions.
Salient entities include:
Full details regarding annotation schema can be found in schema/Combined_Annotation_Instructions.pdf.
Labels were collected from Amazon Mechanical Turk. Workers were required to achieve >70% accuracy in a qualification task before contributing to this dataset.
Data formatting follows 2003 CONLL NER dataset conventions. Individual Worker responses, Worker ID and associated accuracies on the qualification task have been retained.
Original job description data can be found at Kaggle. Credit to user airiddha for the development of the original dataset.
|Rights Holder||Thomas Green|