Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

12/18/2020
by   Zhao Wang, et al.
0

Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase model reliability and generalizability is to identify causal associations between features and classes. In this paper, we propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data. We first identify likely causal features using a statistical matching approach. Next, we generate counterfactual samples for the original training data by substituting causal features with their antonyms and then assigning opposite labels to the counterfactual samples. Finally, we combine the original data and counterfactual data to train a robust classifier. Experiments on two classification tasks show that a traditional classifier trained on the original data does very poorly on human-generated counterfactual samples (e.g., 10 the combined data is more robust and performs well on both the original test data and the counterfactual test data (e.g., 12 compared with the traditional classifier). Detailed analysis shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Towards Robust Classification Model by Counterfactual and Invariant Data Generation

Despite the success of machine learning applications in science, industr...
research
10/21/2022

Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals

For text classification tasks, finetuned language models perform remarka...
research
05/17/2023

Counterfactually Comparing Abstaining Classifiers

Abstaining classifiers have the option to abstain from making prediction...
research
08/21/2023

Debiasing Counterfactuals In the Presence of Spurious Correlations

Deep learning models can perform well in complex medical imaging classif...
research
09/26/2019

Learning the Difference that Makes a Difference with Counterfactually-Augmented Data

Despite alarm over the reliance of machine learning systems on so-called...
research
11/20/2020

Adversarial Training for EM Classification Networks

We present a novel variant of Domain Adversarial Networks with impactful...

Please sign up or login with your details

Forgot password? Click here to reset