Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

09/05/2023
by   Helena Bonaldi, et al.
0

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

READ FULL TEXT
research
04/04/2022

Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

In this work, we present an extensive study on the use of pre-trained la...
research
03/17/2022

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Natural Language Processing (NLP) models risk overfitting to specific te...
research
11/07/2022

Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering

Fighting online hate speech is a challenge that is usually addressed usi...
research
08/01/2022

Parsimonious Argument Annotations for Hate Speech Counter-narratives

We present an enrichment of the Hateval corpus of hate speech tweets (Ba...
research
07/19/2021

Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

Undermining the impact of hateful content with informed and non-aggressi...
research
06/06/2023

TwistList: Resources and Baselines for Tongue Twister Generation

Previous work in phonetically-grounded language generation has mainly fo...
research
06/22/2021

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

Tackling online hatred using informed textual responses - called counter...

Please sign up or login with your details

Forgot password? Click here to reset