Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

01/18/2022
by   Hamdy Mubarak, et al.
10

We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets – analyzing some cultural differences. We observed a constant usage of these emojis to represent offensiveness in throughout different timelines in Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar and violence content. Furthermore, we benchmark the dataset for detecting offense and hate speech using different transformer architectures and performed in-depth linguistic analysis. We evaluate our models on external datasets – a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability. Competitive results on these datasets suggest that the data collected using our method captures universal characteristics of offensive language. Our findings also highlight the common words used in offensive communications; common targets for hate speech; specific patterns in violence tweets and pinpoints common classification errors due to the need to understand the context, consider culture and background and the presence of sarcasm among others.

READ FULL TEXT

page 1

page 11

research
12/09/2019

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic

In this paper, we present an analysis of the first Ethiopic Twitter Data...
research
04/05/2020

Arabic Offensive Language on Twitter: Analysis and Experiments

Detecting offensive language on Twitter has many applications ranging fr...
research
06/17/2021

An Information Retrieval Approach to Building Datasets for Hate Speech Detection

Building a benchmark dataset for hate speech detection presents several ...
research
07/18/2022

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

Online presence on social media platforms such as Facebook and Twitter h...
research
01/20/2021

VoterFraud2020: a Multi-modal Dataset of Election Fraud Claims on Twitter

The wide spread of unfounded election fraud claims surrounding the U.S. ...
research
01/14/2023

Detecting Stance of Authorities towards Rumors in Arabic Tweets: A Preliminary Study

A myriad of studies addressed the problem of rumor verification in Twitt...
research
12/22/2015

Topical differences between Chinese language Twitter and Sina Weibo

Sina Weibo, China's most popular microblogging platform, is currently us...

Please sign up or login with your details

Forgot password? Click here to reset