Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models

03/11/2021
by   Mengnan Du, et al.
0

Recent studies indicate that NLU models are prone to rely on shortcut features for prediction, without achieving true language understanding. As a result, these models fail to generalize to real-world out-of-distribution data. In this work, we show that the words in the NLU training set can be modeled as a long-tailed distribution. There are two findings: 1) NLU models have strong preference for features located at the head of the long-tailed distribution, and 2) Shortcut features are picked up during very early few iterations of the model training. These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LGTR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental results on three NLU benchmarks demonstrate that our long-tailed distribution explanation accurately reflects the shortcut learning behavior of NLU models. Experimental analysis further indicates that LGTR can improve the generalization accuracy on OOD data, while preserving the accuracy on in-distribution data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Improving Tail-Class Representation with Centroid Contrastive Learning

In vision domain, large-scale natural datasets typically exhibit long-ta...
research
05/06/2021

VideoLT: Large-scale Long-tailed Video Recognition

Label distributions in real-world are oftentimes long-tailed and imbalan...
research
09/07/2023

The Devil is in the Tails: How Long-Tailed Code Distributions Impact Large Language Models

Learning-based techniques, especially advanced Large Language Models (LL...
research
01/16/2022

GradTail: Learning Long-Tailed Data Using Gradient-based Sample Weighting

We propose GradTail, an algorithm that uses gradients to improve model p...
research
06/12/2019

Does Learning Require Memorization? A Short Tale about a Long Tail

State-of-the-art results on image recognition tasks are achieved using o...
research
12/30/2022

Delving into Semantic Scale Imbalance

Model bias triggered by long-tailed data has been widely studied. Howeve...
research
12/31/2020

Why do classifier accuracies show linear trends under distribution shift?

Several recent studies observed that when classification models are eval...

Please sign up or login with your details

Forgot password? Click here to reset