Automated Detection of Cyberbullying Against Women and Immigrants and Cross-domain Adaptability

12/04/2020
by   Thushari Atapattu, et al.
0

Cyberbullying is a prevalent and growing social problem due to the surge of social media technology usage. Minorities, women, and adolescents are among the common victims of cyberbullying. Despite the advancement of NLP technologies, the automated cyberbullying detection remains challenging. This paper focuses on advancing the technology using state-of-the-art NLP techniques. We use a Twitter dataset from SemEval 2019 - Task 5(HatEval) on hate speech against women and immigrants. Our best performing ensemble model based on DistilBERT has achieved 0.73 and 0.74 of F1 score in the task of classifying hate speech (Task A) and aggressiveness and target (Task B) respectively. We adapt the ensemble model developed for Task A to classify offensive language in external datasets and achieved  0.7 of F1 score using three benchmark datasets, enabling promising results for cross-domain adaptability. We conduct a qualitative analysis of misclassified tweets to provide insightful recommendations for future cyberbullying research.

READ FULL TEXT
research
07/28/2021

Detecting Abusive Albanian

The ever growing usage of social media in the recent years has had a dir...
research
04/22/2023

Lightweight Toxicity Detection in Spoken Language: A Transformer-based Approach for Edge Devices

Toxicity is a prevalent social behavior that involves the use of hate sp...
research
11/07/2020

NLP-CIC @ PRELEARN: Mastering prerequisites relations, from handcrafted features to embeddings

We present our systems and findings for the prerequisite relation learni...
research
09/14/2022

BERT-based Ensemble Approaches for Hate Speech Detection

With the freedom of communication provided in online social media, hate ...
research
03/16/2021

dictNN: A Dictionary-Enhanced CNN Approach for Classifying Hate Speech on Twitter

Hate speech on social media is a growing concern, and automated methods ...
research
07/30/2020

The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification

In this work, we provide a follow-up on the Moldavian versus Romanian Cr...

Please sign up or login with your details

Forgot password? Click here to reset