Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

05/24/2021
by   Hongyu Gong, et al.
0

Abusive language is a massive problem in online social platforms. Existing abusive language detection techniques are particularly ill-suited to comments containing heterogeneous abusive language patterns, i.e., both abusive and non-abusive parts. This is due in part to the lack of datasets that explicitly annotate heterogeneity in abusive language. We tackle this challenge by providing an annotated dataset of abusive language in over 11,000 comments from YouTube. We account for heterogeneity in this dataset by separately annotating both the comment as a whole and the individual sentences that comprise each comment. We then propose an algorithm that uses a supervised attention mechanism to detect and categorize abusive content using multi-task learning. We empirically demonstrate the challenges of using traditional techniques on heterogeneous content and the comparative gains in performance of the proposed approach over state-of-the-art methods.

READ FULL TEXT
research
05/28/2020

Joint Modelling of Emotion and Abusive Language Detection

The rise of online communication platforms has been accompanied by some ...
research
08/10/2021

Hope Speech detection in under-resourced Kannada language

Numerous methods have been developed to monitor the spread of negativity...
research
09/27/2022

BanglaSarc: A Dataset for Sarcasm Detection

Being one of the most widely spoken language in the world, the use of Ba...
research
06/10/2021

Ruddit: Norms of Offensiveness for English Reddit Comments

On social media platforms, hateful and offensive language negatively imp...
research
09/21/2019

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

With the recent rise of toxicity in online conversations on social media...
research
01/24/2023

ViHOS: Hate Speech Spans Detection for Vietnamese

The rise in hateful and offensive language directed at other users is on...
research
04/03/2020

Directions in Abusive Language Training Data: Garbage In, Garbage Out

Data-driven analysis and detection of abusive online content covers many...

Please sign up or login with your details

Forgot password? Click here to reset