Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

02/04/2019
by   R. Thomas McCoy, et al.
0

Machine learning systems can often achieve high performance on a test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. Based on an analysis of the task, we hypothesize three fallible syntactic heuristics that NLI models are likely to adopt: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including the state-of-the-art model BERT, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2018

Non-entailed subsequences as a challenge for natural language inference

Neural network models have shown great success at natural language infer...
research
10/23/2022

Lexical Generalization Improves with Larger Models and Longer Training

While fine-tuned language models perform well on many tasks, they were a...
research
10/24/2022

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

Cognitive psychologists have documented that humans use cognitive heuris...
research
04/24/2020

Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Pretrained neural models such as BERT, when fine-tuned to perform natura...
research
12/07/2019

Adversarial Analysis of Natural Language Inference Systems

The release of large natural language inference (NLI) datasets like SNLI...
research
09/09/2021

Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning

Recent prompt-based approaches allow pretrained language models to achie...
research
01/10/2021

BERT Family Eat Word Salad: Experiments with Text Understanding

In this paper, we study the response of large models from the BERT famil...

Please sign up or login with your details

Forgot password? Click here to reset