DeepAI AI Chat
Log In Sign Up

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

06/04/2019
by   Chujie Zheng, et al.
Tsinghua University
Nanyang Technological University
0

Cloze-style reading comprehension in Chinese is still limited due to the lack of various corpora. In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus, the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms. We carefully study how the design of candidate idioms and the representation of idioms affect the performance of state-of-the-art models. Results show that the machine accuracy is substantially worse than that of human, indicating a large space for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/25/2017

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

Machine Reading Comprehension (MRC) has become enormously popular recent...
11/09/2017

Large-scale Cloze Test Dataset Designed by Teachers

Cloze test is widely adopted in language exams to evaluate students' lan...
11/09/2020

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

Machine reading comprehension (MRC) is the task that asks a machine to a...
04/15/2017

RACE: Large-scale ReAding Comprehension Dataset From Examinations

We present RACE, a new dataset for benchmark evaluation of methods in th...
12/18/2017

A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Abbreviation is a common phenomenon across languages, especially in Chin...
04/20/2016

A Deep Neural Network for Chinese Zero Pronoun Resolution

Existing approaches for Chinese zero pronoun resolution overlook semanti...
10/12/2020

OCNLI: Original Chinese Natural Language Inference

Despite the tremendous recent progress on natural language inference (NL...