Power of Explanations: Towards automatic debiasing in hate speech detection

09/07/2022
by   Yi Cai, et al.
0

Hate speech detection is a common downstream application of natural language processing (NLP) in the real world. In spite of the increasing accuracy, current data-driven approaches could easily learn biases from the imbalanced data distributions originating from humans. The deployment of biased models could further enhance the existing social biases. But unlike handling tabular data, defining and mitigating biases in text classifiers, which deal with unstructured data, are more challenging. A popular solution for improving machine learning fairness in NLP is to conduct the debiasing process with a list of potentially discriminated words given by human annotators. In addition to suffering from the risks of overlooking the biased terms, exhaustively identifying bias with human annotators are unsustainable since discrimination is variable among different datasets and may evolve over time. To this end, we propose an automatic misuse detector (MiD) relying on an explanation method for detecting potential bias. And built upon that, an end-to-end debiasing framework with the proposed staged correction is designed for text classifiers without any external resources required.

READ FULL TEXT
research
08/08/2023

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles

We investigate the potential for nationality biases in natural language ...
research
01/29/2021

Challenges in Automated Debiasing for Toxic Language Detection

Biased associations have been a challenge in the development of classifi...
research
06/13/2023

Survey on Sociodemographic Bias in Natural Language Processing

Deep neural networks often learn unintended biases during training, whic...
research
10/10/2020

FIND: Human-in-the-Loop Debugging Deep Text Classifiers

Since obtaining a perfect training dataset (i.e., a dataset which is con...
research
10/24/2020

Efficiently Mitigating Classification Bias via Transfer Learning

Prediction bias in machine learning models refers to unintended model be...
research
01/10/2022

Quantifying Gender Bias in Consumer Culture

Cultural items like songs have an important impact in creating and reinf...
research
01/15/2020

Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

With the ever-increasing cases of hate spread on social media platforms,...

Please sign up or login with your details

Forgot password? Click here to reset