Towards Mitigating Spurious Correlations in the Wild: A Benchmark a more Realistic Dataset

06/21/2023
by   Siddharth Joshi, et al.
0

Deep neural networks often exploit non-predictive features that are spuriously correlated with class labels, leading to poor performance on groups of examples without such features. Despite the growing body of recent works on remedying spurious correlations, the lack of a standardized benchmark hinders reproducible evaluation and comparison of the proposed solutions. To address this, we present SpuCo, a python package with modular implementations of state-of-the-art solutions enabling easy and reproducible evaluation of current methods. Using SpuCo, we demonstrate the limitations of existing datasets and evaluation schemes in validating the learning of predictive features over spurious ones. To overcome these limitations, we propose two new vision datasets: (1) SpuCoMNIST, a synthetic dataset that enables simulating the effect of real world data properties e.g. difficulty of learning spurious feature, as well as noise in the labels and features; (2) SpuCoAnimals, a large-scale dataset curated from ImageNet that captures spurious correlations in the wild much more closely than existing datasets. These contributions highlight the shortcomings of current methods and provide a direction for future research in tackling spurious correlations. SpuCo, containing the benchmark and datasets, can be found at https://github.com/BigML-CS-UCLA/SpuCo, with detailed documentation available at https://spuco.readthedocs.io/en/latest/.

READ FULL TEXT

page 6

page 9

page 20

page 21

research
06/27/2023

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents

In this paper, we propose a novel approach to address the challenges of ...
research
09/30/2022

MaskTune: Mitigating Spurious Correlations by Forcing to Explore

A fundamental challenge of over-parameterized deep learning models is le...
research
04/27/2022

On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Recent work has shown that deep learning models in NLP are highly sensit...
research
08/19/2023

ASPIRE: Language-Guided Augmentation for Robust Image Classification

Neural image classifiers can often learn to make predictions by overly r...
research
08/28/2020

A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater Visual Analysis

Visual analysis of complex fish habitats is an important step towards su...
research
12/02/2022

Avoiding spurious correlations via logit correction

Empirical studies suggest that machine learning models trained with empi...
research
04/16/2023

Towards Better Evaluation of GNN Expressiveness with BREC Dataset

Research on the theoretical expressiveness of Graph Neural Networks (GNN...

Please sign up or login with your details

Forgot password? Click here to reset