Overlearning Reveals Sensitive Attributes

05/28/2019
by   Congzheng Song, et al.
0

`Overlearning' means that a model trained for a seemingly simple objective implicitly learns to recognize attributes that are (1) statistically uncorrelated with the objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recognize races— even races that are not represented in the training data— and identities. We demonstrate overlearning in several image-analysis and NLP models and analyze its harmful consequences. First, inference-time internal representations of an overlearned model reveal sensitive attributes of the input, breaking privacy protections such as model partitioning. Second, an overlearned model can be `re-purposed' for a different, uncorrelated task. Overlearning may be inherent to some tasks. We show that techniques for censoring unwanted properties from representations either fail, or degrade the model's performance on both the original and unintended tasks. This is a challenge for regulations that aim to prevent models from learning or using certain attributes.

READ FULL TEXT
research
03/16/2023

Image Classifiers Leak Sensitive Attributes About Their Classes

Neural network-based image classifiers are powerful tools for computer v...
research
08/21/2022

Inferring Sensitive Attributes from Model Explanations

Model explanations provide transparency into a trained machine learning ...
research
09/21/2021

Evaluating Debiasing Techniques for Intersectional Biases

Bias is pervasive in NLP models, motivating the development of automatic...
research
05/16/2018

Towards Robust and Privacy-preserving Text Representations

Written text often provides sufficient clues to identify the author, the...
research
05/12/2022

Fair NLP Models with Differentially Private Text Encoders

Encoded text representations often capture sensitive attributes about in...
research
09/12/2021

Adversarial Representation Learning With Closed-Form Solvers

Adversarial representation learning aims to learn data representations f...
research
09/24/2021

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

Written language carries explicit and implicit biases that can distract ...

Please sign up or login with your details

Forgot password? Click here to reset