Multilingual Cross-domain Perspectives on Online Hate Speech

09/11/2018
by   Tom De Smedt, et al.
0

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.

READ FULL TEXT
research
04/05/2023

Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Improving machine learning performance while increasing model generaliza...
research
07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...
research
03/23/2019

Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Most work in text classification and Natural Language Processing (NLP) f...
research
04/12/2022

Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification

Existing approaches to mitigate demographic biases evaluate on monolingu...
research
08/29/2019

Multilingual and Multi-Aspect Hate Speech Analysis

Current research on hate speech analysis is typically oriented towards m...
research
07/15/2020

Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

This paper presents two colloquial Sinhala language corpora from the lan...
research
08/15/2023

A User-Centered Evaluation of Spanish Text Simplification

We present an evaluation of text simplification (TS) in Spanish for a pr...

Please sign up or login with your details

Forgot password? Click here to reset