Bias Challenges in Counterfactual Data Augmentation

09/12/2022
by   S Chandra Mouli, et al.
38

Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a context-guessing machine, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification

The class imbalance problem can cause machine learning models to produce...
research
04/26/2023

Implicit Counterfactual Data Augmentation for Deep Neural Networks

Machine-learning models are prone to capturing the spurious correlations...
research
07/17/2023

Results on Counterfactual Invariance

In this paper we provide a theoretical analysis of counterfactual invari...
research
05/23/2023

Counterfactual Augmentation for Multimodal Learning Under Presentation Bias

In real-world machine learning systems, labels are often derived from us...
research
08/03/2022

SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences

Distilling supervision signal from a long sequence to make predictions i...
research
05/31/2021

Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests

Informally, a `spurious correlation' is the dependence of a model on som...
research
10/09/2020

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

A growing body of work shows that models exploit annotation artifacts to...

Please sign up or login with your details

Forgot password? Click here to reset