What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?

10/28/2019
by   Chenglei Si, et al.
0

Multiple-Choice Reading Comprehension (MCRC) requires the model to read the passage and question, and select the correct answer among the given options. Recent state-of-the-art models have achieved impressive performance on multiple MCRC datasets. However, such performance may not reflect the model's true ability of language understanding and reasoning. In this work, we adopt two approaches to investigate what BERT learns from MCRC datasets: 1) an un-readable data attack, in which we add keywords to confuse BERT, leading to a significant performance drop; and 2) an un-answerable data training, in which we train BERT on partial or shuffled input. Under un-answerable data training, BERT achieves unexpectedly high performance. Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2.0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead to correct prediction, instead of learning semantic understanding and reasoning; and 2) BERT does not need correct syntactic information to solve the task; 3) there exists artifacts in these datasets such that they can be solved even without the full context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2019

Beyond English-only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian

Recently, reading comprehension models achieved near-human performance o...
research
07/02/2021

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

We investigate how well BERT performs on predicting factuality in severa...
research
12/31/2020

Coreference Reasoning in Machine Reading Comprehension

The ability to reason about multiple references to a given entity is ess...
research
11/06/2020

Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning

Multi-choice Machine Reading Comprehension (MMRC) aims to select the cor...
research
04/15/2021

NT5?! Training T5 to Perform Numerical Reasoning

Numerical reasoning over text (NRoT) presents unique challenges that are...
research
04/04/2021

ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for Abstract Word Prediction

This paper describes our system for Task 4 of SemEval-2021: Reading Comp...
research
08/14/2019

SG-Net: Syntax-Guided Machine Reading Comprehension

For machine reading comprehension, how to effectively model the linguist...

Please sign up or login with your details

Forgot password? Click here to reset