Adversarial Examples for Evaluating Reading Comprehension Systems

07/23/2017
by   Robin Jia, et al.
0

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of 75% F1 score to 36%; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%. We hope our insights will motivate the development of new models that understand language more precisely.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2019

QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Standard accuracy metrics indicate that modern reading comprehension sys...
research
03/31/2023

A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education

Machine reading comprehension has been an interesting and challenging ta...
research
10/22/2019

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

We present the results of the Machine Reading for Question Answering (MR...
research
04/04/2019

Frustratingly Poor Performance of Reading Comprehension Models on Non-adversarial Examples

When humans learn to perform a difficult task (say, reading comprehensio...
research
04/16/2020

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

Evaluation beyond aggregate performance metrics, e.g. F1-score, is cruci...
research
08/20/2019

Universal Adversarial Triggers for NLP

Adversarial examples highlight model vulnerabilities and are useful for ...
research
08/20/2019

Universal Adversarial Triggers for Attacking and Analyzing NLP

Adversarial examples highlight model vulnerabilities and are useful for ...

Please sign up or login with your details

Forgot password? Click here to reset