Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation

07/04/2023
by   Dimosthenis Antypas, et al.
0

The automatic detection of hate speech online is an active research area in NLP. Most of the studies to date are based on social media datasets that contribute to the creation of hate speech detection models trained on them. However, data creation processes contain their own biases, and models inherently learn from these dataset-specific biases. In this paper, we perform a large-scale cross-dataset comparison where we fine-tune language models on different hate speech detection datasets. This analysis shows how some datasets are more generalisable than others when used as training data. Crucially, our experiments show how combining hate speech detection datasets can contribute to the development of robust hate speech detection models. This robustness holds even when controlling by data size and compared with the best individual datasets.

READ FULL TEXT
research
06/04/2023

Exposing Bias in Online Communities through Large-Scale Language Models

Progress in natural language generation research has been shaped by the ...
research
02/09/2021

Leveraging cross-platform data to improve automated hate speech detection

Hate speech is increasingly prevalent online, and its negative outcomes ...
research
06/02/2023

Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

In this paper, we investigate the issue of hate speech by presenting a n...
research
07/31/2023

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

Public figures receive a disproportionate amount of abuse on social medi...
research
09/06/2023

On the Challenges of Building Datasets for Hate Speech Detection

Detection of hate speech has been formulated as a standalone application...
research
05/08/2023

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Speech datasets are crucial for training Speech Language Technologies (S...
research
06/22/2021

Statistical Analysis of Perspective Scores on Hate Speech Detection

Hate speech detection has become a hot topic in recent years due to the ...

Please sign up or login with your details

Forgot password? Click here to reset