Frustratingly Poor Performance of Reading Comprehension Models on Non-adversarial Examples

04/04/2019
by   Soham Parikh, et al.
0

When humans learn to perform a difficult task (say, reading comprehension (RC) over longer passages), it is typically the case that their performance improves significantly on an easier version of this task (say, RC over shorter passages). Ideally, we would want an intelligent agent to also exhibit such a behavior. However, on experimenting with state of the art RC models using the standard RACE dataset, we observe that this is not true. Specifically, we see counter-intuitive results wherein even when we show frustratingly easy examples to the model at test time, there is hardly any improvement in its performance. We refer to this as non-adversarial evaluation as opposed to adversarial evaluation. Such non-adversarial examples allow us to assess the utility of specialized neural components. For example, we show that even for easy examples where the answer is clearly embedded in the passage, the neural components designed for paying attention to relevant portions of the passage fail to serve their intended purpose. We believe that the non-adversarial dataset created as a part of this work would complement the research on adversarial evaluation and give a more realistic assessment of the ability of RC models. All the datasets and codes developed as a part of this work will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension

We propose a machine reading comprehension model based on the compare-ag...
research
04/15/2017

RACE: Large-scale ReAding Comprehension Dataset From Examinations

We present RACE, a new dataset for benchmark evaluation of methods in th...
research
07/23/2017

Adversarial Examples for Evaluating Reading Comprehension Systems

Standard accuracy metrics indicate that reading comprehension systems ar...
research
06/18/2019

Automatic learner summary assessment for reading comprehension

Automating the assessment of learner summaries provides a useful tool fo...
research
04/29/2020

Benchmarking Robustness of Machine Reading Comprehension Models

Machine Reading Comprehension (MRC) is an important testbed for evaluati...
research
11/16/2019

Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization

In spite of great advancements of machine reading comprehension (RC), ex...
research
08/20/2019

Universal Adversarial Triggers for NLP

Adversarial examples highlight model vulnerabilities and are useful for ...

Please sign up or login with your details

Forgot password? Click here to reset