On the interpretation and significance of bias metrics in texts: a PMI-based approach

by   Francisco Valentini, et al.

In recent years, the use of word embeddings has become popular to measure the presence of biases in texts. Despite the fact that these measures have been shown to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency, explainability and interpretability. In this study, we propose a PMI-based metric to quantify biases in texts. We show that this metric can be approximated by an odds ratio, which allows estimating the confidence interval and statistical significance of textual bias. We also show that this PMI-based measure can be expressed as a function of conditional probabilities, providing a simple interpretation in terms of word co-occurrences. Our approach produces a performance comparable to GloVe-based and Skip-gram-based metrics in experiments of gender-occupation and gender-name associations. We discuss the advantages and disadvantages of using methods based on first-order vs second-order co-occurrences, from the point of view of the interpretability of the metric and the sparseness of the data.


page 1

page 2

page 3

page 4


The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Numerous works use word embedding-based metrics to quantify societal bia...

Black is to Criminal as Caucasian is to Police:Detecting and Removing Multiclass Bias in Word Embeddings

Online texts -- across genres, registers, domains, and styles -- are rid...

Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings

Word embeddings are the standard model for semantic and syntactic repres...

Evaluating Metrics for Bias in Word Embeddings

Over the last years, word and sentence embeddings have established as te...

AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

Recent work has shown that distributional word vector spaces often encod...

The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning

To treat others as one would wish to be treated is a common formulation ...

A Bayesian approach to uncertainty in word embedding bias estimation

Multiple measures, such as WEAT or MAC, attempt to quantify the magnitud...

Please sign up or login with your details

Forgot password? Click here to reset