Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference

11/07/2022
by   Sara Rajaee, et al.
0

It has been shown that NLI models are usually biased with respect to the word-overlap between premise and hypothesis; they take this feature as a primary cue for predicting the entailment label. In this paper, we focus on an overlooked aspect of the overlap bias in NLI models: the reverse word-overlap bias. Our experimental results demonstrate that current NLI models are highly biased towards the non-entailment label on instances with low overlap, and the existing debiasing methods, which are reportedly successful on existing challenge datasets, are generally ineffective in addressing this category of bias. We investigate the reasons for the emergence of the overlap bias and the role of minority examples in its mitigation. For the former, we find that the word-overlap bias does not stem from pre-training, and for the latter, we observe that in contrast to the accepted assumption, eliminating minority examples does not affect the generalizability of debiasing methods with respect to the overlap bias.

READ FULL TEXT
research
01/19/2021

Exploring Lexical Irregularities in Hypothesis-Only Models of Natural Language Inference

Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE)...
research
06/17/2020

Extremal overlap-free and extremal β-free binary words

An overlap-free (or β-free) word w over a fixed alphabet Σ is extremal i...
research
09/18/2023

Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Discriminatory social biases, including gender biases, have been found i...
research
08/28/2019

Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual

Statistical natural language inference (NLI) models are susceptible to l...
research
09/01/2021

Don't Discard All the Biased Instances: Investigating a Core Assumption in Dataset Bias Mitigation Techniques

Existing techniques for mitigating dataset bias often leverage a biased ...
research
04/22/2018

Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

The quality of training data is one of the crucial problems when a learn...
research
05/31/2021

More than just Frequency? Demasking Unsupervised Hypernymy Prediction Methods

This paper presents a comparison of unsupervised methods of hypernymy pr...

Please sign up or login with your details

Forgot password? Click here to reset