AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

12/20/2022
by   Wenjie Yin, et al.
0

Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2021

DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

Online hate speech is an important issue that breaks the cohesiveness of...
research
09/05/2023

Leveraging Label Information for Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to detect the emotional status...
research
10/20/2017

Detecting Online Hate Speech Using Context Aware Models

In the wake of a polarizing election, the cyber world is laden with hate...
research
06/29/2021

Hate speech detection using static BERT embeddings

With increasing popularity of social media platforms hate speech is emer...
research
09/22/2022

Homophone Reveals the Truth: A Reality Check for Speech2Vec

Generating spoken word embeddings that possess semantic information is a...
research
03/28/2022

Few-Shot Learning with Siamese Networks and Label Tuning

We study the problem of building text classifiers with little or no trai...
research
08/20/2018

Out-of-Distribution Detection using Multiple Semantic Label Representations

Deep Neural Networks are powerful models that attained remarkable result...

Please sign up or login with your details

Forgot password? Click here to reset