Semantics derived automatically from language corpora contain human-like biases

08/25/2016
by   Aylin Caliskan, et al.
0

Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.

READ FULL TEXT
research
07/21/2016

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

The blind application of machine learning runs the risk of amplifying bi...
research
03/25/2019

On Measuring Social Biases in Sentence Encoders

The Word Embedding Association Test shows that GloVe and word2vec word e...
research
05/28/2019

Algorithmic Bias and the Biases of the Bias Catchers

Concerns about gender bias have captured most of the attention in the AI...
research
03/01/2021

WordBias: An Interactive Visual Tool for Discovering Intersectional Biases Encoded in Word Embeddings

Intersectional bias is a bias caused by an overlap of multiple social fa...
research
09/10/2019

Attesting Biases and Discrimination using Language Semantics

AI agents are increasingly deployed and used to make automated decisions...
research
05/22/2022

Evidence for Hypodescent in Visual Semantic AI

We examine the state-of-the-art multimodal "visual semantic" model CLIP ...
research
03/14/2022

VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

VAST, the Valence-Assessing Semantics Test, is a novel intrinsic evaluat...

Please sign up or login with your details

Forgot password? Click here to reset