Even Small Correlation and Diversity Shifts Pose Dataset-Bias Issues

05/09/2023
by   Alceu Bissoto, et al.
0

Distribution shifts are common in real-world datasets and can affect the performance and reliability of deep learning models. In this paper, we study two types of distribution shifts: diversity shifts, which occur when test samples exhibit patterns unseen during training, and correlation shifts, which occur when test data present a different correlation between seen invariant and spurious features. We propose an integrated protocol to analyze both types of shifts using datasets where they co-exist in a controllable manner. Finally, we apply our approach to a real-world classification problem of skin cancer analysis, using out-of-distribution datasets and specialized bias annotations. Our protocol reveals three findings: 1) Models learn and propagate correlation shifts even with low-bias training; this poses a risk of accumulating and combining unaccountable weak biases; 2) Models learn robust features in high- and low-bias scenarios but use spurious ones if test samples have them; this suggests that spurious correlations do not impair the learning of robust features; 3) Diversity shift can reduce the reliance on spurious correlations; this is counter intuitive since we expect biased models to depend more on biases when invariant features are missing. Our work has implications for distribution shift research and practice, providing new insights into how models learn and rely on spurious correlations under different types of shifts.

READ FULL TEXT
research
04/23/2020

Debiasing Skin Lesion Datasets and Models? Not So Fast

Data-driven models are now deployed in a plethora of real-world applicat...
research
11/05/2022

Adversarial Causal Augmentation for Graph Covariate Shift

Out-of-distribution (OOD) generalization on graphs is drawing widespread...
research
05/04/2023

On the nonlinear correlation of ML performance between data subpopulations

Understanding the performance of machine learning (ML) models across div...
research
07/09/2021

Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

For machine learning systems to be reliable, we must understand their pe...
research
08/30/2023

Learning Diverse Features in Vision Transformers for Improved Generalization

Deep learning models often rely only on a small set of features even whe...
research
10/13/2022

Disentanglement of Correlated Factors via Hausdorff Factorized Support

A grand goal in deep learning research is to learn representations capab...

Please sign up or login with your details

Forgot password? Click here to reset