Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations

01/15/2020
by   Pinkesh Badjatiya, et al.
0

With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of information from the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and in mitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size  96k and a Twitter dataset of size  24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks

READ FULL TEXT
research
10/22/2020

Reducing Unintended Identity Bias in Russian Hate Speech Detection

Toxicity has become a grave problem for many online communities and has ...
research
05/29/2022

BiasEnsemble: Revisiting the Importance of Amplifying Bias for Debiasing

In image classification, "debiasing" aims to train a classifier to be le...
research
12/01/2022

"All of the White People Went First": How Video Conferencing Consolidates Control and Exacerbates Workplace Bias

Workplace bias creates negative psychological outcomes for employees, pe...
research
01/25/2021

Diverse Adversaries for Mitigating Bias in Training

Adversarial learning can learn fairer and less biased models of language...
research
09/07/2022

Power of Explanations: Towards automatic debiasing in hate speech detection

Hate speech detection is a common downstream application of natural lang...
research
08/09/2022

Exploring Hate Speech Detection with HateXplain and BERT

Hate Speech takes many forms to target communities with derogatory comme...
research
08/02/2020

Trawling for Trolling: A Dataset

The ability to accurately detect and filter offensive content automatica...

Please sign up or login with your details

Forgot password? Click here to reset