Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

09/21/2019
by   Ameya Vaidya, et al.
0

With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have developed a variety of shallow and deep learning models to identify toxicity in online conversations, reviews, or comments with mixed successes. However, these existing approaches have learned to incorrectly associate non-toxic comments that have certain trigger-words (e.g. gay, lesbian, black, muslim) as a potential source of toxicity. In this paper, we evaluate dozens of state-of-the-art models with the specific focus of reducing model bias towards these commonly-attacked identity groups. We propose a multi-task learning model with an attention layer that jointly learns to predict the toxicity of a comment as well as the identities present in the comments in order to reduce this bias. We then compare our model to an array of shallow and deep-learning models using metrics designed especially to test for unintended model bias within these identity groups.

READ FULL TEXT
research
05/01/2022

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

Machine learning models are commonly used to detect toxicity in online c...
research
10/22/2020

Reducing Unintended Identity Bias in Russian Hate Speech Detection

Toxicity has become a grave problem for many online communities and has ...
research
06/29/2020

Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers

The censorship of toxic comments is often left to the judgment of imperf...
research
10/08/2021

ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments

Peer assessment has been widely applied across diverse academic fields o...
research
03/11/2019

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

Unintended bias in Machine Learning can manifest as systemic differences...
research
05/24/2021

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

Abusive language is a massive problem in online social platforms. Existi...
research
02/28/2018

Neural Aesthetic Image Reviewer

Recently, there is a rising interest in perceiving image aesthetics. The...

Please sign up or login with your details

Forgot password? Click here to reset