What do we expect from Multiple-choice QA Systems?

11/20/2020
by   Krunal Shah, et al.
0

The recent success of machine learning systems on various QA datasets could be interpreted as a significant improvement in models' language understanding abilities. However, using various perturbations, multiple recent works have shown that good performance on a dataset might not indicate performance that correlates well with human's expectations from models that "understand" language. In this work we consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets, and evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs. Our results show that the model clearly falls short of our expectations, and motivates a modified training approach that forces the model to better attend to the inputs. We show that the new training paradigm leads to a model that performs on par with the original model while better satisfying our expectations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2019

CS563-QA: A Collection for Evaluating Question Answering Systems

Question Answering (QA) is a challenging topic since it requires tacklin...
research
05/02/2020

UnifiedQA: Crossing Format Boundaries With a Single QA System

Question answering (QA) tasks have been posed using a variety of formats...
research
02/28/2022

Improving Lexical Embeddings for Robust Question Answering

Recent techniques in Question Answering (QA) have gained remarkable perf...
research
11/29/2022

Penalizing Confident Predictions on Largely Perturbed Inputs Does Not Improve Out-of-Distribution Generalization in Question Answering

Question answering (QA) models are shown to be insensitive to large pert...
research
06/18/2018

Comparative Analysis of Neural QA models on SQuAD

The task of Question Answering has gained prominence in the past few dec...
research
08/08/2019

Mitigating Noisy Inputs for Question Answering

Natural language processing systems are often downstream of unreliable i...
research
10/01/2018

The Profiling Machine: Active Generalization over Knowledge

The human mind is a powerful multifunctional knowledge storage and manag...

Please sign up or login with your details

Forgot password? Click here to reset