OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach

05/14/2020
by   Fatemah Husain, et al.
0

The preprocessing phase is one of the key phases within the text classification pipeline. This study aims at investigating the impact of the preprocessing phase on text classification, specifically on offensive language and hate speech classification for Arabic text. The Arabic language used in social media is informal and written using Arabic dialects, which makes the text classification task very complex. Preprocessing helps in dimensionality reduction and removing useless content. We apply intensive preprocessing techniques to the dataset before processing it further and feeding it into the classification model. An intensive preprocessing-based approach demonstrates its significant impact on offensive language detection and hate speech detection shared tasks of the fourth workshop on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT). Our team wins the third place (3rd) in the Sub-Task A Offensive Language Detection division and wins the first place (1st) in the Sub-Task B Hate Speech Detection division, with an F1 score of 89 95 F1, accuracy, recall, and precision for Arabic hate speech detection.

READ FULL TEXT
research
05/16/2020

Arabic Offensive Language Detection Using Machine Learning and Ensemble Machine Learning Approaches

This study aims at investigating the effect of applying single learner m...
research
02/26/2015

Rational Kernels for Arabic Stemming and Text Classification

In this paper, we address the problems of Arabic Text Classification and...
research
06/12/2020

A Face Preprocessing Approach for Improved DeepFake Detection

Recent advancements in content generation technologies (also widely know...
research
03/09/2021

Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification

Since their inception, transformer-based language models have led to imp...
research
07/18/2022

AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify

Online presence on social media platforms such as Facebook and Twitter h...
research
03/27/2023

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

The rapid growth in user generated content on social media has resulted ...
research
05/16/2022

Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

In this paper, we tackle the Arabic Fine-Grained Hate Speech Detection s...

Please sign up or login with your details

Forgot password? Click here to reset