Linear Adversarial Concept Erasure

01/28/2022
by   Shauli Ravfogel, et al.
1

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to control their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method – despite being linear – is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

READ FULL TEXT

page 10

page 22

page 23

research
01/28/2022

Adversarial Concept Erasure in Kernel Space

The representation space of neural models for textual data emerges in an...
research
12/15/2021

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Large pre-trained language models are often trained on large volumes of ...
research
10/18/2022

Linear Guardedness and its Implications

Previous work on concept identification in neural representations has fo...
research
09/20/2020

Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation

Bolukbasi et al. (2016) presents one of the first gender bias mitigation...
research
06/06/2023

LEACE: Perfect linear concept erasure in closed form

Concept erasure aims to remove specified features from a representation....
research
08/26/2019

Deep Closed-Form Subspace Clustering

We propose Deep Closed-Form Subspace Clustering (DCFSC), a new embarrass...
research
02/17/2018

Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

Strong intelligent machines powered by deep neural networks are increasi...

Please sign up or login with your details

Forgot password? Click here to reset