The Birth of Bias: A case study on the evolution of gender bias in an English language model

07/21/2022
by   Oskar van der Wal, et al.
0

Detecting and mitigating harmful biases in modern language models are widely recognized as crucial, open problems. In this paper, we take a step back and investigate how language models come to be biased in the first place. We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. With full access to the data and to the model parameters as they change during every step while training, we can map in detail how the representation of gender develops, what patterns in the dataset drive this, and how the model's internal state relates to the bias in a downstream task (semantic textual similarity). We find that the representation of gender is dynamic and identify different phases during training. Furthermore, we show that gender information is represented increasingly locally in the input embeddings of the model and that, as a consequence, debiasing these can be effective in reducing the downstream bias. Monitoring the training dynamics, allows us to detect an asymmetry in how the female and male gender are represented in the input embeddings. This is important, as it may cause naive mitigation strategies to introduce new undesirable biases. We discuss the relevance of the findings for mitigation strategies more generally and the prospects of generalizing our methods to larger language models, the Transformer architecture, other languages and other undesirable biases.

READ FULL TEXT
research
06/16/2023

Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models

In efforts to keep up with the rapid progress and use of large language ...
research
08/22/2018

Reducing Gender Bias in Abusive Language Detection

Abusive language detection models tend to have a problem of being biased...
research
09/11/2023

Detecting Natural Language Biases with Prompt-based Learning

In this project, we want to explore the newly emerging field of prompt e...
research
01/01/2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

As natural language processing (NLP) for gender bias becomes a significa...
research
09/08/2021

Sustainable Modular Debiasing of Language Models

Unfair stereotypical biases (e.g., gender, racial, or religious biases) ...
research
08/24/2023

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

As language models (LMs) become increasingly powerful, it is important t...
research
06/06/2023

MISGENDERED: Limits of Large Language Models in Understanding Pronouns

Content Warning: This paper contains examples of misgendering and erasur...

Please sign up or login with your details

Forgot password? Click here to reset