Considerations for the Interpretation of Bias Measures of Word Embeddings

06/19/2019
by   Inom Mirzaev, et al.
0

Word embedding spaces are powerful tools for capturing latent semantic relationships between terms in corpora, and have become widely popular for building state-of-the-art natural language processing algorithms. However, studies have shown that societal biases present in text corpora may be incorporated into the word embedding spaces learned from them. Thus, there is an ethical concern that human-like biases contained in the corpora and their derived embedding spaces might be propagated, or even amplified with the usage of the biased embedding spaces in downstream applications. In an attempt to quantify these biases so that they may be better understood and studied, several bias metrics have been proposed. We explore the statistical properties of these proposed measures in the context of their cited applications as well as their supposed utilities. We find that there are caveats to the simple interpretation of these metrics as proposed. We find that the bias metric proposed by Bolukbasi et al. 2016 is highly sensitive to embedding hyper-parameter selection, and that in many cases, the variance due to the selection of some hyper-parameters is greater than the variance in the metric due to corpus selection, while in fewer cases the bias rankings of corpora vary with hyper-parameter selection. In light of these observations, it may be the case that bias estimates should not be thought to directly measure the properties of the underlying corpus, but rather the properties of the specific embedding spaces in question, particularly in the context of hyper-parameter selections used to generate them. Hence, bias metrics of spaces generated with differing hyper-parameters should be compared only with explicit consideration of the embedding-learning algorithms particular configurations.

READ FULL TEXT
research
04/11/2018

Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy Tasks

The versatility of word embeddings for various applications is attractin...
research
10/08/2018

Understanding the Origins of Bias in Word Embeddings

The power of machine learning systems not only promises great technical ...
research
10/15/2019

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora

Understanding how human semantic knowledge is organized and how people u...
research
12/11/2018

On the Dimensionality of Word Embedding

In this paper, we provide a theoretical understanding of word embedding ...
research
12/31/2020

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Natural Language Processing (NLP) systems learn harmful societal biases ...
research
03/11/2021

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Recent research efforts in NLP have demonstrated that distributional wor...
research
06/04/2019

Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914

We investigate some aspects of the history of antisemitism in France, on...

Please sign up or login with your details

Forgot password? Click here to reset