How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

05/23/2023
by   Viktor Hangya, et al.
0

Due to the broad range of social media platforms and their user groups, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly changing. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection, in order to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that by leveraging already existing datasets and only a few-shots of the target task the performance of models can be improved not only monolingually but across languages as well. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset. We also analyze the trade-off between specializing the already existing datasets to a given target setup for best performance and its negative effects on model adaptability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

RAFT: Rationale adaptor for few-shot abusive language detection

Abusive language is a concerning problem in online social media. Past re...
research
05/28/2017

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

As the body of research on abusive language detection and analysis grows...
research
01/15/2022

Addressing the Challenges of Cross-Lingual Hate Speech Detection

The goal of hate speech detection is to filter negative online content a...
research
04/29/2022

ExaASC: A General Target-Based Stance Detection Corpus in Arabic Language

Target-based Stance Detection is the task of finding a stance toward a t...
research
01/11/2023

Few-shot Learning for Cross-Target Stance Detection by Aggregating Multimodal Embeddings

Despite the increasing popularity of the stance detection task, existing...
research
03/14/2021

AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection

Automated hate speech detection in social media is a challenging task th...
research
07/27/2021

Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach

Online harassment in the form of hate speech has been on the rise in rec...

Please sign up or login with your details

Forgot password? Click here to reset