Deep Learning Models for Multilingual Hate Speech Detection

04/14/2020
by   Sai Saket Aluru, et al.
0

Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public for other researchers at https://github.com/punyajoy/DE-LIMIT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Abusive language is a growing concern in many social media platforms. Re...
research
04/06/2022

ByT5 model for massively multilingual grapheme-to-phoneme conversion

In this study, we tackle massively multilingual grapheme-to-phoneme conv...
research
06/26/2023

Uncovering Political Hate Speech During Indian Election Campaign: A New Low-Resource Dataset and Baselines

The detection of hate speech in political discourse is a critical issue,...
research
12/08/2021

ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing

Scientific documents often contain a large number of acronyms. Disambigu...
research
03/05/2021

Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners

We present a multilingual end-to-end Text-To-Speech framework that maps ...
research
04/04/2023

Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech ...
research
10/28/2022

SG-VAD: Stochastic Gates Based Speech Activity Detection

We propose a novel voice activity detection (VAD) model in a low-resourc...

Please sign up or login with your details

Forgot password? Click here to reset