Highly Generalizable Models for Multilingual Hate Speech Detection

01/27/2022
by   Neha Deshpande, et al.
0

Hate speech detection has become an important research topic within the past decade. More private corporations are needing to regulate user generated content on different platforms across the globe. In this paper, we introduce a study of multilingual hate speech classification. We compile a dataset of 11 languages and resolve different taxonomies by analyzing the combined data with binary labels: hate speech or not hate speech. Defining hate speech in a single way across different languages and datasets may erase cultural nuances to the definition, therefore, we utilize language agnostic embeddings provided by LASER and MUSE in order to develop models that can use a generalized definition of hate speech across datasets. Furthermore, we evaluate prior state of the art methodologies for hate speech detection under our expanded dataset. We conduct three types of experiments for a binary hate speech classification task: Multilingual-Train Monolingual-Test, MonolingualTrain Monolingual-Test and Language-Family-Train Monolingual Test scenarios to see if performance increases for each language due to learning more from other language data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Multilingual and Multi-Aspect Hate Speech Analysis

Current research on hate speech analysis is typically oriented towards m...
research
06/20/2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test se...
research
04/03/2023

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification

Current research on hate speech analysis is typically oriented towards m...
research
03/17/2021

Investigating Monolingual and Multilingual BERTModels for Vietnamese Aspect Category Detection

Aspect category detection (ACD) is one of the challenging tasks in the A...
research
10/23/2021

Hate and Offensive Speech Detection in Hindi and Marathi

Sentiment analysis is the most basic NLP task to determine the polarity ...
research
07/10/2020

To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Hate speech is an important problem in the management of user-generated ...
research
05/03/2021

Scalar Adjective Identification and Multilingual Ranking

The intensity relationship that holds between scalar adjectives (e.g., n...

Please sign up or login with your details

Forgot password? Click here to reset