Mitigating Biases in Toxic Language Detection through Invariant Rationalization

06/14/2021
by   Yung-Sung Chuang, et al.
0

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse. However, biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. The biases make the learned models unfair and can even exacerbate the marginalization of people. Considering that current debiasing methods for general natural language understanding tasks cannot effectively mitigate the biases in the toxicity detectors, we propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns (e.g., identity mentions, dialect) to toxicity labels. We empirically show that our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2020

Investigating Societal Biases in a Poetry Composition System

There is a growing collection of work analyzing and mitigating societal ...
research
09/09/2021

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Model robustness to bias is often determined by the generalization on ca...
research
05/28/2023

Robust Natural Language Understanding with Residual Attention Debiasing

Natural language understanding (NLU) models often suffer from unintended...
research
07/16/2020

Towards Debiasing Sentence Representations

As natural language processing methods are increasingly deployed in real...
research
07/20/2022

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks

Deep image classifiers have been found to learn biases from datasets. To...
research
02/09/2021

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Recent work has indicated that many natural language understanding and r...
research
06/02/2023

NLPositionality: Characterizing Design Biases of Datasets and Models

Design biases in NLP systems, such as performance differences for differ...

Please sign up or login with your details

Forgot password? Click here to reset