To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

07/10/2020
by   Kristian Miok, et al.
0

Hate speech is an important problem in the management of user-generated content. In order to remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on transformer architecture, such as (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo Dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the introduced approach on hate speech detection problems in several languages. From the experiments performed it was observed that our approach significantly improve the hate speech detection that can not be trusted. Our approach not only improves classification performance of the state-of-the-art multilingual BERT model, but the computed reliability scores also significantly reduce the workload in the inspection of offending cases and in reannotation campaigns. The provided visualization helps to understand the borderline outcomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2019

Prediction Uncertainty Estimation for Hate Speech Classification

As a result of social network popularity, in recent years, hate speech p...
research
01/08/2021

Leveraging Multilingual Transformers for Hate Speech Detection

Detecting and classifying instances of hate in social media text has bee...
research
01/22/2021

HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for Hate Speech Detection

Hateful and Toxic content has become a significant concern in today's wo...
research
01/27/2022

Highly Generalizable Models for Multilingual Hate Speech Detection

Hate speech detection has become an important research topic within the ...
research
08/29/2019

Multilingual and Multi-Aspect Hate Speech Analysis

Current research on hate speech analysis is typically oriented towards m...
research
05/22/2023

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Hate speech is a severe issue that affects many online platforms. So far...
research
09/19/2021

Unified and Multilingual Author Profiling for Detecting Haters

This paper presents a unified user profiling framework to identify hate ...

Please sign up or login with your details

Forgot password? Click here to reset