Resource: Benchmark Corpus to Support Entity Recognition in Job Descriptions

Reference Benchmark Corpus to Support Entity Recognition in Job Descriptions
Date of Submission Jan. 12, 2022, 4:28 p.m.
Status accepted
ISLRN 907-628-490-988-5
Resource Type Primary Text
Media Type Text
Source
Language English
Format/MIME Type text/txt
Size 13.9MB
Description

A public, human-labelled dataset of salient entities in job descriptions.

Salient entities include:

Skill
Qualification
Experience
Occupation
Domain

Full details regarding annotation schema can be found in schema/Combined_Annotation_Instructions.pdf.

Labels were collected from Amazon Mechanical Turk. Workers were required to achieve >70% accuracy in a qualification task before contributing to this dataset.

Data formatting follows 2003 CONLL NER dataset conventions. Individual Worker responses, Worker ID and associated accuracies on the qualification task have been retained.

Original job description data can be found at Kaggle. Credit to user airiddha for the development of the original dataset.

Version 1.0
Creator Thomas Green
Distributor Thomas Green
Rights Holder Thomas Green