Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

12/07/2020
by   Fereshte Khani, et al.
0

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets – each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/28/2021

On the Bias Against Inductive Biases

Borrowing from the transformer models that revolutionized the field of n...
11/22/2019

Noise Induces Loss Discrepancy Across Groups for Linear Regression

We study the effect of feature noise (measurement error) on the discrepa...
01/22/2019

The autofeat Python Library for Automatic Feature Engineering and Selection

This paper describes the autofeat Python library, which provides a sciki...
06/07/2022

Certifying Data-Bias Robustness in Linear Regression

Datasets typically contain inaccuracies due to human error and societal ...
11/08/2019

Certified Data Removal from Machine Learning Models

Good data stewardship requires removal of data at the request of the dat...
10/27/2020

Selective Classification Can Magnify Disparities Across Groups

Selective classification, in which models are allowed to abstain on unce...
03/15/2022

Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information

We describe a simple and effective method (Spectral Attribute removaL; S...