Benchmarking Robustness of Machine Reading Comprehension Models

04/29/2020
by   Chenglei Si, et al.
0

Machine Reading Comprehension (MRC) is an important testbed for evaluating models' natural language understanding (NLU) ability. There has been rapid progress in this area, with new models achieving impressive performance on various MRC benchmarks. However, most of these benchmarks only evaluate models on in-domain test sets without considering their robustness under test-time perturbations. To fill this important gap, we construct AdvRACE (Adversarial RACE), a new model-agnostic benchmark for evaluating the robustness of MRC models under six different types of test-time perturbations, including our novel superimposed attack and distractor construction attack. We show that current state-of-the-art (SOTA) models are vulnerable to these simple black-box attacks. Our benchmark is constructed automatically based on the existing RACE benchmark, and thus the construction pipeline can be easily adopted by other tasks and datasets. We will release the data and source codes to facilitate future work. We hope that our work will encourage more research on improving the robustness of MRC and other NLU models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2017

RACE: Large-scale ReAding Comprehension Dataset From Examinations

We present RACE, a new dataset for benchmark evaluation of methods in th...
research
01/31/2023

The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

Pretrained language models have achieved super-human performances on man...
research
05/15/2023

EMBRACE: Evaluation and Modifications for Boosting RACE

When training and evaluating machine reading comprehension models, it is...
research
04/06/2023

Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming

Question answering (QA) models have shown compelling results in the task...
research
11/21/2019

Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets

Existing analysis work in machine reading comprehension (MRC) is largely...
research
04/04/2019

Frustratingly Poor Performance of Reading Comprehension Models on Non-adversarial Examples

When humans learn to perform a difficult task (say, reading comprehensio...
research
06/07/2023

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

The increasing reliance on Large Language Models (LLMs) across academia ...

Please sign up or login with your details

Forgot password? Click here to reset