Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective

02/16/2022
by   Shihan Dou, et al.
0

Natural language understanding (NLU) models tend to rely on spurious correlations (i.e., dataset bias) to achieve high performance on in-distribution datasets but poor performance on out-of-distribution ones. Most of the existing debiasing methods often identify and weaken these samples with biased features (i.e., superficial surface features that cause such spurious correlations). However, down-weighting these samples obstructs the model in learning from the non-biased parts of these samples. To tackle this challenge, in this paper, we propose to eliminate spurious correlations in a fine-grained manner from a feature space perspective. Specifically, we introduce Random Fourier Features and weighted re-sampling to decorrelate the dependencies between features to mitigate spurious correlations. After obtaining decorrelated features, we further design a mutual-information-based method to purify them, which forces the model to learn features that are more relevant to tasks. Extensive experiments on two well-studied NLU tasks including Natural Language Inference and Fact Verification demonstrate that our method is superior to other comparative approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2022

Feature-Level Debiased Natural Language Understanding

Natural language understanding (NLU) models often rely on dataset biases...
research
09/21/2023

Environment-biased Feature Ranking for Novelty Detection Robustness

We tackle the problem of robust novelty detection, where we aim to detec...
research
12/02/2020

Learning from others' mistakes: Avoiding dataset biases without modeling them

State-of-the-art natural language processing (NLP) models often learn to...
research
09/05/2021

End-to-End Self-Debiasing Framework for Robust NLU Training

Existing Natural Language Understanding (NLU) models have been shown to ...
research
08/19/2022

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding

Generalized text representations are the foundation of many natural lang...
research
10/28/2022

Debiasing Masks: A New Framework for Shortcut Mitigation in NLU

Debiasing language models from unwanted behaviors in Natural Language Un...
research
11/03/2020

The Gap on GAP: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets

Diagnostic datasets that can detect biased models are an important prere...

Please sign up or login with your details

Forgot password? Click here to reset