The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

01/02/2023
by   Francisco Valentini, et al.
0

Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

The Dependence on Frequency of Word Embedding Similarity Measures

Recent research has shown that static word embeddings can encode word fr...
research
04/13/2021

On the interpretation and significance of bias metrics in texts: a PMI-based approach

In recent years, the use of word embeddings has become popular to measur...
research
06/03/2022

Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals

Does the grammatical gender of a language interfere when measuring the s...
research
12/19/2022

Norm of word embedding encodes information gain

Distributed representations of words encode lexical semantic information...
research
06/15/2023

A Bayesian approach to uncertainty in word embedding bias estimation

Multiple measures, such as WEAT or MAC, attempt to quantify the magnitud...
research
10/05/2016

Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database

Word embeddings have been extensively studied in large text datasets. Ho...
research
10/31/2018

On The Inductive Bias of Words in Acoustics-to-Word Models

Acoustics-to-word models are end-to-end speech recognizers that use word...

Please sign up or login with your details

Forgot password? Click here to reset