Investigating Gender Bias in BERT

by   Rishabh Bhardwaj, et al.

Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender-bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing "he" to "she", or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., BERT. We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing BERT's word embeddings. We then evaluate the gender-bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system's predictions on gender-particular words and phrases. We claim that such biases can be reduced by removing genderspecific features from word embedding. Hence, for each layer in BERT, we identify directions that primarily encode gender information. The space formed by such directions is referred to as the gender subspace in the semantic space of word embeddings. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. This obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.


page 1

page 2

page 3

page 4


A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations

Word embedding has become essential for natural language processing as i...

Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation

Recent researches demonstrate that word embeddings, trained on the human...

The effects of gender bias in word embeddings on depression prediction

Word embeddings are extensively used in various NLP problems as a state-...

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Word embeddings derived from human-generated corpora inherit strong gend...

Gender Bias in BERT – Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task

Pretrained language models are publicly available and constantly finetun...

Conceptor-Aided Debiasing of Contextualized Embeddings

Pre-trained language models reflect the inherent social biases of their ...

LEACE: Perfect linear concept erasure in closed form

Concept erasure aims to remove specified features from a representation....

Please sign up or login with your details

Forgot password? Click here to reset