Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

07/19/2021
by   Margherita Fanton, et al.
0

Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.

READ FULL TEXT
research
11/07/2022

Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering

Fighting online hate speech is a challenge that is usually addressed usi...
research
04/15/2021

Does Putting a Linguist in the Loop Improve NLU Data Collection?

Many crowdsourced NLP datasets contain systematic gaps and biases that a...
research
10/08/2019

CONAN – COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech

Although there is an unprecedented effort to provide adequate responses ...
research
06/22/2021

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

Tackling online hatred using informed textual responses - called counter...
research
12/31/2020

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

We present a first-of-its-kind large synthetic training dataset for onli...
research
04/05/2022

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

We introduce Dynatask: an open source system for setting up custom NLP t...
research
09/05/2023

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

Recent computational approaches for combating online hate speech involve...

Please sign up or login with your details

Forgot password? Click here to reset