When Choosing Plausible Alternatives, Clever Hans can be Clever

11/01/2019
by   Pride Kavumba, et al.
0

Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT's and RoBERTa's good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues. To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT's and RoBERTa's performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they are made ineffective, suggesting that BERT learns the task to a certain degree when forced to. In contrast, RoBERTa does not appear to rely on superficial cues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2020

Position Masking for Language Models

Masked language modeling (MLM) pre-training models such as BERT corrupt ...
research
05/19/2022

Are Prompt-based Models Clueless?

Finetuning large pre-trained language models with a task-specific head h...
research
07/05/2021

Doing Good or Doing Right? Exploring the Weakness of Commonsense Causal Reasoning Models

Pretrained language models (PLM) achieve surprising performance on the C...
research
04/23/2021

Learning to Learn to be Right for the Right Reasons

Improving model generalization on held-out data is one of the core objec...
research
08/01/2019

Visual cues in estimation of part-to-whole comparison

Pie charts were first published in 1801 by William Playfair and have cau...
research
07/17/2019

Probing Neural Network Comprehension of Natural Language Arguments

We are surprised to find that BERT's peak performance of 77 Reasoning Co...
research
10/20/2020

Optimal Subarchitecture Extraction For BERT

We extract an optimal subset of architectural parameters for the BERT ar...

Please sign up or login with your details

Forgot password? Click here to reset