Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

10/14/2021
by   Tianlu Wang, et al.
0

Recently, NLP models have achieved remarkable progress across a variety of tasks; however, they have also been criticized for being not robust. Many robustness problems can be attributed to models exploiting spurious correlations, or shortcuts between the training data and the task labels. Models may fail to generalize to out-of-distribution data or be vulnerable to adversarial attacks if spurious correlations are exploited through the training process. In this paper, we aim to automatically identify such spurious correlations in NLP models at scale. We first leverage existing interpretability methods to extract tokens that significantly affect model's decision process from the input text. We then distinguish "genuine" tokens and "spurious" tokens by analyzing model predictions across multiple corpora and further verify them through knowledge-aware perturbations. We show that our proposed method can effectively and efficiently identify a scalable set of "shortcuts", and mitigating these leads to more robust models in multiple applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Learning to Ignore Adversarial Attacks

Despite the strong performance of current NLP models, they can be brittl...
research
08/13/2023

Faithful to Whom? Questioning Interpretability Measures in NLP

A common approach to quantifying model interpretability is to calculate ...
research
10/24/2022

Does Self-Rationalization Improve Robustness to Spurious Correlations?

Rationalization is fundamental to human reasoning and learning. NLP mode...
research
12/14/2019

Towards Robust Toxic Content Classification

Toxic content detection aims to identify content that can offend or harm...
research
02/12/2023

TextDefense: Adversarial Text Detection based on Word Importance Entropy

Currently, natural language processing (NLP) models are wildly used in v...
research
10/14/2021

Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features

Recent work finds modern natural language processing (NLP) models relyin...
research
11/17/2020

SIENA: Stochastic Multi-Expert Neural Patcher

Neural network (NN) models that are solely trained to maximize the likel...

Please sign up or login with your details

Forgot password? Click here to reset