Human-in-the-Loop Hate Speech Classification in a Multilingual Context

12/05/2022
by   Ana Kotarcic, et al.
0

The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been proposed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment performance, classifier maintenance and infrastructural limitations. In this paper, we introduce a new human-in-the-loop BERT-based hate speech classification pipeline and trace its development from initial data collection and annotation all the way to post-deployment. Our classifier, trained using data from our original corpus of over 422k examples, is specifically developed for the inherently multilingual setting of Switzerland and outperforms with its F1 score of 80.5 the currently best-performing BERT-based multilingual classifier by 5.8 F1 points in German and 3.6 F1 points in French. Our systematic evaluations over a 12-month period further highlight the vital importance of continuous, human-in-the-loop classifier maintenance to ensure robust hate speech classification post-deployment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2019

Offensive Language and Hate Speech Detection for Danish

The presence of offensive language on social media platforms and the imp...
research
01/08/2021

Leveraging Multilingual Transformers for Hate Speech Detection

Detecting and classifying instances of hate in social media text has bee...
research
05/01/2020

Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing

In this paper, we present our participation in SemEval-2020 Task-12 Subt...
research
11/08/2021

Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

The popularity of social media has created problems such as hate speech ...
research
05/30/2023

Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders

Because lab accuracy of clinical speech technology systems may be overop...
research
07/22/2018

German Dialect Identification Using Classifier Ensembles

In this paper we present the GDI_classification entry to the second Germ...

Please sign up or login with your details

Forgot password? Click here to reset