Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

05/17/2023
by   Shadi Iskander, et al.
0

Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations. Our method consists of iteratively training neural classifiers to predict a particular attribute we seek to eliminate, followed by a projection of the representation on a hypersurface, such that the classifiers become oblivious to the target attribute. We evaluate the effectiveness of our method on the task of removing gender and race information as sensitive attributes. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations, with minimal impact on downstream task accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2020

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

The ability to control for the kinds of information encoded in neural re...
research
10/15/2021

Socially Aware Bias Measurements for Hindi Language Representations

Language representations are an efficient tool used across NLP, but they...
research
06/30/2020

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

Language representations are known to carry stereotypical biases and, as...
research
09/24/2021

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

Written language carries explicit and implicit biases that can distract ...
research
10/26/2022

FairCLIP: Social Bias Elimination based on Attribute Prototype Learning and Representation Neutralization

The Vision-Language Pre-training (VLP) models like CLIP have gained popu...
research
02/06/2023

Erasure of Unaligned Attributes from Neural Representations

We present the Assignment-Maximization Spectral Attribute removaL (AMSAL...
research
05/19/2021

Obstructing Classification via Projection

Machine learning and data mining techniques are effective tools to class...

Please sign up or login with your details

Forgot password? Click here to reset