ISLRN

Benchmark Corpus to Support Entity Recognition in Job Descriptions

Full Official Name: Benchmark Corpus to Support Entity Recognition in Job Descriptions

Submission date: Jan. 12, 2022, 4:28 p.m.

A public, human-labelled dataset of salient entities in job descriptions. Salient entities include: Skill Qualification Experience Occupation Domain Full details regarding annotation schema can be found in schema/Combined_Annotation_Instructions.pdf. Labels were collected from Amazon Mechanical Turk. Workers were required to achieve >70% accuracy in a qualification task before contributing to this dataset. Data formatting follows 2003 CONLL NER dataset conventions. Individual Worker responses, Worker ID and associated accuracies on the qualification task have been retained. Original job description data can be found at Kaggle. Credit to user airiddha for the development of the original dataset.

Creator(s)

Thomas Green

Distributor(s)

Thomas Green

Right Holder(s)

Thomas Green

Status : Accepted

ISLRN :

907-628-490-988-5

Version

1.0

Source

https://github.com/acp19tag/skill-extraction-dataset

Resource Type

Primary Text

Media Type

Text

Language(s)

English

Access Medium