Predictive Modeling in the Presence of Nuisance-Induced Spurious Correlations

by   Aahlad Puli, et al.

Deep predictive models often make use of spurious correlations between the label and the covariates that differ between training and test distributions. In many classification tasks, spurious correlations are induced by a changing relationship between the label and some nuisance variables correlated with the covariates. For example, in classifying animals in natural images, the background, which is the nuisance, can predict the type of animal. This nuisance-label relationship does not always hold. We formalize a family of distributions that only differ in the nuisance-label relationship and introduce a distribution where this relationship is broken called the nuisance-randomized distribution. We introduce a set of predictive models built from the nuisance-randomized distribution with representations, that when conditioned on, do not correlate the label and the nuisance. For models in this set, we lower bound the performance for any member of the family with the mutual information between the representation and the label under the nuisance-randomized distribution. To build predictive models that maximize the performance lower bound, we develop Nuisance-Randomized Distillation (NURD). We evaluate NURD on a synthetic example, colored-MNIST, and classifying chest X-rays. When using non-lung patches as the nuisance in classifying chest X-rays, NURD produces models that predict pneumonia under strong spurious correlations.


page 9

page 23


Multi-label Contrastive Predictive Coding

Variational mutual information (MI) estimators are widely used in unsupe...

Complexity of randomized algorithms for underdamped Langevin dynamics

We establish an information complexity lower bound of randomized algorit...

Instance-Dependent Partial Label Learning

Partial label learning (PLL) is a typical weakly supervised learning pro...

Uninformative Input Features and Counterfactual Invariance: Two Perspectives on Spurious Correlations in Natural Language

Spurious correlations are a threat to the trustworthiness of natural lan...

Deep Learning Applied to Chest X-Rays: Exploiting and Preventing Shortcuts

While deep learning has shown promise in improving the automated diagnos...

Robustness to Spurious Correlations via Human Annotations

The reliability of machine learning systems critically assumes that the ...