Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework

09/27/2021
by   Matan Halevy, et al.
5

Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical evidence in its ability to unlearn the annotation biases towards authors who use African American English. ** Please note that this work may contain examples of offensive words and phrases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Fairness in Language Models Beyond English: Gaps and Challenges

With language models becoming increasingly ubiquitous, it has become ess...
research
10/15/2021

Socially Aware Bias Measurements for Hindi Language Representations

Language representations are an efficient tool used across NLP, but they...
research
04/20/2023

On the Independence of Association Bias and Empirical Fairness in Language Models

The societal impact of pre-trained language models has prompted research...
research
10/28/2021

Hate Speech Classifiers Learn Human-Like Social Stereotypes

Social stereotypes negatively impact individuals' judgements about diffe...
research
04/07/2022

Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in English, Spanish, and Arabic

As natural language processing systems become more widespread, it is nec...
research
10/25/2019

Toward a better trade-off between performance and fairness with kernel-based distribution matching

As recent literature has demonstrated how classifiers often carry uninte...
research
04/16/2020

There is Strength in Numbers: Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Natural Language Inference (NLI) datasets contain annotation artefacts r...

Please sign up or login with your details

Forgot password? Click here to reset