Resource: CMRC 2018 Dataset

Reference A Chinese Reading Comprehension Dataset for the 2nd Chinese Machine Reading Comprehension Evaluation (CMRC 2018)
Date of Submission Oct. 22, 2018, 3:22 p.m.
Status accepted
ISLRN 013-662-947-043-2
Resource Type Primary Text
Media Type Text
Source
Language Chinese
Description

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention.
However, the existing reading comprehension datasets are mostly in English.
In this paper, we introduce a Span-Extraction dataset for Chinese Machine Reading Comprehension to add language diversities in this area.
The dataset is composed by near 20,000 real questions annotated by human on Wikipedia paragraphs.
We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context.
With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).

Version v1.0
Creator Yiming Cui - iFLYTEK Research , Ting Liu , Zhipeng Chen , Shijin Wang , Guoping Hu , Wentao Ma , Li Xiao
Distributor Yiming Cui - iFLYTEK Research
Rights Holder Yiming Cui - iFLYTEK Research