Gender Bias in Text: Labeled Datasets and Lexicons

01/21/2022
by   Jad Doughman, et al.
0

Language has a profound impact on our thoughts, perceptions, and conceptions of gender roles. Gender-inclusive language is, therefore, a key tool to promote social inclusion and contribute to achieving gender equality. Consequently, detecting and mitigating gender bias in texts is instrumental in halting its propagation and societal implications. However, there is a lack of gender bias datasets and lexicons for automating the detection of gender bias using supervised and unsupervised machine learning (ML) and natural language processing (NLP) techniques. Therefore, the main contribution of this work is to publicly provide labeled datasets and exhaustive lexicons by collecting, annotating, and augmenting relevant sentences to facilitate the detection of gender bias in English text. Towards this end, we present an updated version of our previously proposed taxonomy by re-formalizing its structure, adding a new bias type, and mapping each bias subtype to an appropriate detection methodology. The released datasets and lexicons span multiple bias subtypes including: Generic He, Generic She, Explicit Marking of Sex, and Gendered Neologisms. We leveraged the use of word embedding models to further augment the collected lexicons. The underlying motivation of our work is to enable the technical community to combat gender bias in text and halt its propagation using ML and NLP techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

Mitigating Gender Bias in Natural Language Processing: Literature Review

As Natural Language Processing (NLP) and Machine Learning (ML) tools ris...
research
11/21/2022

Identifying gender bias in blockbuster movies through the lens of machine learning

The problem of gender bias is highly prevalent and well known. In this p...
research
07/31/2018

Gender Bias in Neural Natural Language Processing

We examine whether neural natural language processing (NLP) systems refl...
research
05/14/2020

Mitigating Gender Bias in Machine Learning Data Sets

Algorithmic bias has the capacity to amplify and perpetuatesocietal bias...
research
12/05/2022

INCLUSIFY: A benchmark and a model for gender-inclusive German

Gender-inclusive language is important for achieving gender equality in ...
research
07/23/2022

Robots Enact Malignant Stereotypes

Stereotypes, bias, and discrimination have been extensively documented i...
research
01/28/2023

Bipol: Multi-axes Evaluation of Bias with Explainability in Benchmark Datasets

We evaluate five English NLP benchmark datasets (available on the superG...

Please sign up or login with your details

Forgot password? Click here to reset