A Bayesian approach to uncertainty in word embedding bias estimation

06/15/2023
by   Alicja Dobrzeniecka, et al.
0

Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2018

What are the biases in my word embedding?

This paper presents an algorithm for enumerating biases in word embeddin...
research
03/09/2020

Joint Multiclass Debiasing of Word Embeddings

Bias in Word Embeddings has been a subject of recent interest, along wit...
research
01/02/2023

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Numerous works use word embedding-based metrics to quantify societal bia...
research
12/11/2018

On the Dimensionality of Word Embedding

In this paper, we provide a theoretical understanding of word embedding ...
research
03/01/2021

WordBias: An Interactive Visual Tool for Discovering Intersectional Biases Encoded in Word Embeddings

Intersectional bias is a bias caused by an overlap of multiple social fa...
research
04/13/2021

On the interpretation and significance of bias metrics in texts: a PMI-based approach

In recent years, the use of word embeddings has become popular to measur...

Please sign up or login with your details

Forgot password? Click here to reset