Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement

by   Elisa Leonardelli, et al.

Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a trend that has emerged recently, we focus on the level of agreement among annotators while selecting data to create offensive language datasets, a task involving a high level of subjectivity. Our study comprises the creation of three novel datasets of English tweets covering different topics and having five crowd-sourced judgments each. We also present an extensive set of experiments showing that selecting training and test data according to different levels of annotators' agreement has a strong effect on classifiers performance and robustness. Our findings are further validated in cross-domain experiments and studied using a popular benchmark dataset. We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation and we advocate for a higher presence of ambiguous cases in future datasets, particularly in test sets, to better account for the different points of view expressed online.


page 1

page 2

page 3

page 4


Annotating Hate and Offenses on Social Media

This paper describes a corpus annotation process to support the identifi...

Racial Bias in Hate Speech and Abusive Language Detection Datasets

Technologies for abusive language detection are being developed and appl...

Building a Pilot Software Quality-in-Use Benchmark Dataset

Prepared domain specific datasets plays an important role to supervised ...

Electoral Agitation Data Set: The Use Case of the Polish Election

The popularity of social media makes politicians use it for political ad...

On the limits of cross-domain generalization in automated X-ray prediction

This large scale study focuses on quantifying what X-rays diagnostic pre...

Non-Parametric Temporal Adaptation for Social Media Topic Classification

User-generated social media data is constantly changing as new trends in...

One-Shot Unsupervised Cross-Domain Detection

Despite impressive progress in object detection over the last years, it ...

Please sign up or login with your details

Forgot password? Click here to reset