Large-Scale Hate Speech Detection with Cross-Domain Transfer

03/02/2022
by   Cagri Toraman, et al.
12

The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5 English and 10 performance is also scalable to different training sizes, such that 98 performance in English, and 97 instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96 of a target domain in average is recovered by other domains for English, and 92 domains, while sports fail most.

READ FULL TEXT
research
05/18/2023

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Africa has over 2000 indigenous languages but they are under-represented...
research
07/12/2021

Hate versus Politics: Detection of Hate against Policy makers in Italian tweets

Accurate detection of hate speech against politicians, policy making and...
research
08/11/2023

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Overlapped Speech Detection (OSD) is an important part of speech applica...
research
10/08/2021

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming

In this study, we propose a novel adversarial reprogramming (AR) approac...
research
07/31/2023

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

Public figures receive a disproportionate amount of abuse on social medi...
research
12/15/2021

Cross-Domain Generalization and Knowledge Transfer in Transformers Trained on Legal Data

We analyze the ability of pre-trained language models to transfer knowle...
research
04/07/2022

Exploring Cross-Domain Pretrained Model for Hyperspectral Image Classification

A pretrain-finetune strategy is widely used to reduce the overfitting th...

Please sign up or login with your details

Forgot password? Click here to reset