Directions in Abusive Language Training Data: Garbage In, Garbage Out

04/03/2020
by   Bertie Vidgen, et al.
0

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2017

Online Deception Detection Refueled by Real World Data Collection

The lack of large realistic datasets presents a bottleneck in online dec...
research
08/21/2023

BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

Advances in automated detection of offensive language online, including ...
research
10/22/2022

Stance Detection and Open Research Avenues

This tutorial aims to cover the state-of-the-art on stance detection and...
research
06/29/2022

Towards a Data-Driven Requirements Engineering Approach: Automatic Analysis of User Reviews

We are concerned by Data Driven Requirements Engineering, and in particu...
research
12/26/2019

Vision and Language: from Visual Perception to Content Creation

Vision and language are two fundamental capabilities of human intelligen...
research
05/24/2021

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

Abusive language is a massive problem in online social platforms. Existi...
research
05/08/2023

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Speech datasets are crucial for training Speech Language Technologies (S...

Please sign up or login with your details

Forgot password? Click here to reset