On Analyzing Annotation Consistency in Online Abusive Behavior Datasets

by   Md Rabiul Awal, et al.

Online abusive behavior is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have proposed, collected, and annotated online abusive content datasets. These datasets play a critical role in facilitating the research on online hate speech and abusive behaviors. However, the annotation of such datasets is a difficult task; it is often contentious on what should be the true label of a given text as the semantic difference of the labels may be blurred (e.g., abusive and hate) and often subjective. In this study, we proposed an analytical framework to study the annotation consistency in online hate and abusive content datasets. We applied our proposed framework to evaluate the consistency of the annotation in three popular datasets that are widely used in online hate speech and abusive behavior studies. We found that there is still a substantial amount of annotation inconsistency in the existing datasets, particularly when the labels are semantically similar.


page 1

page 2

page 3

page 4


DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

Online hate speech is an important issue that breaks the cohesiveness of...

Can We Automate the Analysis of Online Child Sexual Exploitation Discourse?

Social media's growing popularity raises concerns around children's onli...

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Human annotated data is the cornerstone of today's artificial intelligen...

A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change

In this study we collect and annotate human-human role-play dialogues in...

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

As the body of research on abusive language detection and analysis grows...

"It's Not Just Hate”: A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Well-annotated data is a prerequisite for good Natural Language Processi...

A Web of Hate: Tackling Hateful Speech in Online Social Spaces

Online social platforms are beset with hateful speech - content that exp...

Please sign up or login with your details

Forgot password? Click here to reset