Robustness to Spurious Correlations via Human Annotations

07/13/2020
by   Megha Srivastava, et al.
14

The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions. However, unmeasured variables, such as confounders, break this assumption—useful correlations between features and labels at training time can become useless or even harmful at test time. For example, high obesity is generally predictive for heart disease, but this relation may not hold for smokers who generally have lower rates of obesity and higher rates of heart disease. We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality. Specifically, we use human annotation to augment each training example with a potential unmeasured variable (i.e. an underweight patient with heart disease may be a smoker), reducing the problem to a covariate shift problem. We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts. Empirically, we show improvements of 5-10 confounded by rotation, and 1.5-5 confounded by location.

READ FULL TEXT
research
07/06/2020

Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift

A fundamental assumption of most machine learning algorithms is that the...
research
12/02/2022

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

Models trained via empirical risk minimization (ERM) are known to rely o...
research
06/19/2023

Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts

Effective machine learning models learn both robust features that direct...
research
06/14/2021

Examining and Combating Spurious Features under Distribution Shift

A central goal of machine learning is to learn robust representations th...
research
12/12/2011

Robust Learning via Cause-Effect Models

We consider the problem of function estimation in the case where the dat...
research
05/24/2023

Promoting Generalization in Cross-Dataset Remote Photoplethysmography

Remote Photoplethysmography (rPPG), or the remote monitoring of a subjec...
research
06/29/2021

Predictive Modeling in the Presence of Nuisance-Induced Spurious Correlations

Deep predictive models often make use of spurious correlations between t...

Please sign up or login with your details

Forgot password? Click here to reset