Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

10/27/2020
by   Marion Bartl, et al.
0

Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically, especially in view of the current emphasis on large-scale, multilingual language models.

READ FULL TEXT

page 8

page 14

research
05/18/2020

Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings

Recent research has demonstrated that vector space models of semantics c...
research
09/13/2021

Mitigating Language-Dependent Ethnic Bias in BERT

BERT and other large-scale language models (LMs) contain gender and raci...
research
10/22/2019

Grammatical Gender, Neo-Whorfianism, and Word Embeddings: A Data-Driven Approach to Linguistic Relativity

The relation between language and thought has occupied linguists for at ...
research
06/11/2019

Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology

Gender stereotypes are manifest in most of the world's languages and are...
research
02/14/2023

Exploring Category Structure with Contextual Language Models and Lexical Semantic Networks

Recent work on predicting category structure with distributional models,...
research
09/02/2019

It's All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

This paper treats gender bias latent in word embeddings. Previous mitiga...
research
09/21/2022

Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Natural Language Processing (NLP) has become increasingly utilized to pr...

Please sign up or login with your details

Forgot password? Click here to reset