Log In Sign Up

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

by   Yanrui Du, et al.

Many recent works indicate that the deep neural networks tend to take dataset biases as shortcuts to make decision, rather than understand the tasks, which results in failures on the real-world applications. In this work, we focus on the spurious correlation between feature and label, which derive from the biased data distribution in the training data, and analyze it concretely. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis reveals that the biased examples with spurious correlations are easier for models to learn, and when predicting, the biased words make significantly higher contributions to models' predictions than other words, and the models tend to assign the labels over-relying on the spurious correlation between words and labels. To mitigate the model's over-reliance on the shortcut, we propose a training strategy Less-Learn-Shortcut (LLS): we quantify the biased degree of the biased examples, and down-weight them with the biased degree. Experimental results on QM and NLI tasks show that the models improve the performances both on in-domain and adversarial data (1.57 HANS) with our LLS.


SelecMix: Debiased Learning by Contradicting-pair Sampling

Neural networks trained with ERM (empirical risk minimization) sometimes...

Learning from Failure: Training Debiased Classifier from Biased Classifier

Neural networks often learn to make predictions that overly rely on spur...

Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering

Extractive question answering (QA) models tend to exploit spurious corre...

Adaptive Label Smoothing To Regularize Large-Scale Graph Training

Graph neural networks (GNNs), which learn the node representations by re...

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

Dataset bias and spurious correlations can significantly impair generali...

Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference

It has been shown that NLI models are usually biased with respect to the...

Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Can models with particular structure avoid being biased towards spurious...