Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

12/07/2020
by   Fereshte Khani, et al.
0

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets – each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

On the Bias Against Inductive Biases

Borrowing from the transformer models that revolutionized the field of n...
research
11/22/2019

Noise Induces Loss Discrepancy Across Groups for Linear Regression

We study the effect of feature noise (measurement error) on the discrepa...
research
06/22/2023

Identifying and Disentangling Spurious Features in Pretrained Image Representations

Neural networks employ spurious correlations in their predictions, resul...
research
01/22/2019

The autofeat Python Library for Automatic Feature Engineering and Selection

This paper describes the autofeat Python library, which provides a sciki...
research
11/08/2019

Certified Data Removal from Machine Learning Models

Good data stewardship requires removal of data at the request of the dat...
research
10/27/2020

Selective Classification Can Magnify Disparities Across Groups

Selective classification, in which models are allowed to abstain on unce...
research
03/15/2022

Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information

We describe a simple and effective method (Spectral Attribute removaL; S...

Please sign up or login with your details

Forgot password? Click here to reset