DeepAI AI Chat
Log In Sign Up

AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

by   Wenjie Yin, et al.

Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.


page 1

page 2

page 3

page 4


DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

Online hate speech is an important issue that breaks the cohesiveness of...

Detecting Online Hate Speech Using Context Aware Models

In the wake of a polarizing election, the cyber world is laden with hate...

Hate speech detection using static BERT embeddings

With increasing popularity of social media platforms hate speech is emer...

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Training state-of-the-art Automated Speech Recognition (ASR) models typi...

Homophone Reveals the Truth: A Reality Check for Speech2Vec

Generating spoken word embeddings that possess semantic information is a...

AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection

Automated hate speech detection in social media is a challenging task th...