Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics

06/07/2022
by   Aylin Caliskan, et al.
0

The statistical regularities in language corpora encode well-known social biases into word embeddings. Here, we focus on gender to provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora (GloVe 2014, fastText 2017). Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases that also show differences in: (1) frequencies of words associated with men versus women; (b) part-of-speech tags in gender-associated words; (c) semantic categories in gender-associated words; and (d) valence, arousal, and dominance in gender-associated words. First, in terms of word frequency: we find that, of the 1,000 most frequent words in the vocabulary, 77 direct evidence of a masculine default in the everyday language of the English-speaking world. Second, turning to parts-of-speech: the top male-associated words are typically verbs (e.g., fight, overpower) while the top female-associated words are typically adjectives and adverbs (e.g., giving, emotionally). Gender biases in embeddings also permeate parts-of-speech. Third, for semantic categories: bottom-up, cluster analyses of the top 1,000 words associated with each gender. The top male-associated concepts include roles and domains of big tech, engineering, religion, sports, and violence; in contrast, the top female-associated concepts are less focused on roles, including, instead, female-specific slurs and sexual content, as well as appearance and kitchen terms. Fourth, using human ratings of word valence, arousal, and dominance from a  20,000 word lexicon, we find that male-associated words are higher on arousal and dominance, while female-associated words are higher on valence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2016

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

The blind application of machine learning runs the risk of amplifying bi...
research
11/15/2022

The Dependence on Frequency of Word Embedding Similarity Measures

Recent research has shown that static word embeddings can encode word fr...
research
01/20/2022

Regional Negative Bias in Word Embeddings Predicts Racial Animus–but only via Name Frequency

The word embedding association test (WEAT) is an important method for me...
research
03/14/2022

VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

VAST, the Valence-Assessing Semantics Test, is a novel intrinsic evaluat...
research
11/24/2020

Gender bias in magazines oriented to men and women: a computational approach

Cultural products are a source to acquire individual values and behaviou...
research
08/06/2020

Discovering and Categorising Language Biases in Reddit

We present a data-driven approach using word embeddings to discover and ...
research
09/02/2020

Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms

Search Engines (SE) have been shown to perpetuate well-known gender ster...

Please sign up or login with your details

Forgot password? Click here to reset