Regional Negative Bias in Word Embeddings Predicts Racial Animus–but only via Name Frequency

01/20/2022
by   Austin van Loon, et al.
0

The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus–even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spurious anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

The Dependence on Frequency of Word Embedding Similarity Measures

Recent research has shown that static word embeddings can encode word fr...
research
06/07/2022

Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics

The statistical regularities in language corpora encode well-known socia...
research
12/20/2018

What are the biases in my word embedding?

This paper presents an algorithm for enumerating biases in word embeddin...
research
11/24/2020

Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis

We present a new approach for detecting human-like social biases in word...
research
11/24/2020

Argument from Old Man's View: Assessing Social Bias in Argumentation

Social bias in language - towards genders, ethnicities, ages, and other ...
research
04/17/2021

Frequency-based Distortions in Contextualized Word Embeddings

How does word frequency in pre-training data affect the behavior of simi...
research
03/31/2021

Self-Supervised Euphemism Detection and Identification for Content Moderation

Fringe groups and organizations have a long history of using euphemisms–...

Please sign up or login with your details

Forgot password? Click here to reset