"It's Not Just Hate”: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

10/28/2022
by   Federico Bianchi, et al.
0

Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed by optimizing time or annotator agreement. We make a case for nuanced efforts in an interdisciplinary setting for annotating offensive online speech. Detecting offensive content is rapidly becoming one of the most important real-world NLP tasks. However, most datasets use a single binary label, e.g., for hate or incivility, even though each concept is multi-faceted. This modeling choice severely limits nuanced insights, but also performance. We show that a more fine-grained multi-label approach to predicting incivility and hateful or intolerant content addresses both conceptual and performance issues. We release a novel dataset of over 40,000 tweets about immigration from the US and UK, annotated with six labels for different aspects of incivility and intolerance. Our dataset not only allows for a more nuanced understanding of harmful speech online, models trained on it also outperform or match performance on benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2022

Mining Multi-Label Samples from Single Positive Labels

Conditional generative adversarial networks (cGANs) have shown superior ...
research
08/23/2022

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Online Hate speech detection has become important with the growth of dig...
research
10/12/2022

Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Norwegian Twitter data poses an interesting challenge for Natural Langua...
research
04/28/2023

HQP: A Human-Annotated Dataset for Detecting Online Propaganda

Online propaganda poses a severe threat to the integrity of societies. H...
research
06/24/2020

On Analyzing Annotation Consistency in Online Abusive Behavior Datasets

Online abusive behavior is an important issue that breaks the cohesivene...
research
01/25/2023

Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

We commonly use agreement measures to assess the utility of judgements m...
research
10/02/2017

HUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis

Computational Humor, as the name implies, studies humor from a computati...

Please sign up or login with your details

Forgot password? Click here to reset