DuReaderrobust: A Chinese Dataset Towards Evaluating the Robustness of Machine Reading Comprehension Models

04/23/2020
by   Hongxuan Tang, et al.
0

Machine Reading Comprehension (MRC) is a crucial and challenging task in natural language processing. Although several MRC models obtains human parity performance on several datasets, we find that these models are still far from robust. To comprehensively evaluate the robustness of MRC models, we create a Chinese dataset, namely DuReader_robust. It is designed to challenge MRC models from the following aspects: (1) over-sensitivity, (2) over-stability and (3) generalization. Most of previous work studies these problems by altering the inputs to unnatural texts. By contrast, the advantage of DuReader_robust is that its questions and documents are natural texts. It presents the robustness challenges when applying MRC models to real-world applications. The experimental results show that MRC models based on the pre-trained language models perform much worse than human does on the robustness test set, although they perform as well as human on in-domain test set. Additionally, we analyze the behavior of existing models on the robustness test set, which might give suggestions for future model development. The dataset and codes are available at <https://github.com/PaddlePaddle/Research/tree/master/NLP/DuReader-Robust-BASELINE>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

Machine Reading Comprehension (MRC) has become enormously popular recent...
research
04/07/2020

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Owing to the continuous contributions by the Chinese NLP community, more...
research
05/22/2023

Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models

Recent studies in natural language processing (NLP) have focused on mode...
research
07/06/2023

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Deep text understanding, which requires the connections between a given ...
research
04/06/2023

Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming

Question answering (QA) models have shown compelling results in the task...
research
03/20/2023

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

Leaderboard systems allow researchers to objectively evaluate Natural La...
research
10/05/2020

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning

Interactive Fiction (IF) games with real human-written natural language ...

Please sign up or login with your details

Forgot password? Click here to reset