Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

02/28/2021
by   Timo Schick, et al.
13

When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models often require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we investigate whether pretrained language models at least know when they exhibit some undesirable bias or produce toxic content. Based on our findings, we propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behavior. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model's parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2022

Leashing the Inner Demons: Self-Detoxification for Language Models

Language models (LMs) can reproduce (or amplify) toxic language seen dur...
research
09/13/2023

In-Contextual Bias Suppression for Large Language Models

Despite their impressive performance in a wide range of NLP tasks, Large...
research
03/21/2022

Word Order Does Matter (And Shuffled Language Models Know It)

Recent studies have shown that language models pretrained and/or fine-tu...
research
02/15/2023

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement l...
research
04/30/2022

Detoxifying Language Models with a Toxic Corpus

Existing studies have investigated the tendency of autoregressive langua...
research
05/25/2023

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Large language models (large LMs) are susceptible to producing text with...
research
09/15/2022

Measuring Geographic Performance Disparities of Offensive Language Classifiers

Text classifiers are applied at scale in the form of one-size-fits-all s...

Please sign up or login with your details

Forgot password? Click here to reset