AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings

11/03/2020
by   Anne Lauscher, et al.
0

Recent work has shown that distributional word vector spaces often encode human biases like sexism or racism. In this work, we conduct an extensive analysis of biases in Arabic word embeddings by applying a range of recently introduced bias tests on a variety of embedding spaces induced from corpora in Arabic. We measure the presence of biases across several dimensions, namely: embedding models (Skip-Gram, CBOW, and FastText) and vector sizes, types of text (encyclopedic text, and news vs. user-generated content), dialects (Egyptian Arabic vs. Modern Standard Arabic), and time (diachronic analyses over corpora from different time periods). Our analysis yields several interesting findings, e.g., that implicit gender bias in embeddings trained on Arabic news corpora steadily increases over time (between 2007 and 2017). We make the Arabic bias specifications (AraWEAT) publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2019

Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors

Word embeddings have recently been shown to reflect many of the pronounc...
research
09/13/2019

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

Distributional word vectors have recently been shown to encode many of t...
research
11/22/2020

DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings

Word embeddings are a core component of modern natural language processi...
research
08/13/2021

Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases

We analyze bias in historical corpora as encoded in diachronic distribut...
research
03/11/2021

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Recent research efforts in NLP have demonstrated that distributional wor...
research
04/13/2021

On the interpretation and significance of bias metrics in texts: a PMI-based approach

In recent years, the use of word embeddings has become popular to measur...
research
08/11/2017

Improved Abusive Comment Moderation with User Embeddings

Experimenting with a dataset of approximately 1.6M user comments from a ...

Please sign up or login with your details

Forgot password? Click here to reset