TEET! Tunisian Dataset for Toxic Speech Detection

10/11/2021
by   Slim Gharbi, et al.
0

The complete freedom of expression in social media has its costs especially in spreading harmful and abusive content that may induce people to act accordingly. Therefore, the need of detecting automatically such a content becomes an urgent task that will help and enhance the efficiency in limiting this toxic spread. Compared to other Arabic dialects which are mostly based on MSA, the Tunisian dialect is a combination of many other languages like MSA, Tamazight, Italian and French. Because of its rich language, dealing with NLP problems can be challenging due to the lack of large annotated datasets. In this paper we are introducing a new annotated dataset composed of approximately 10k of comments. We provide an in-depth exploration of its vocabulary through feature engineering approaches as well as the results of the classification performance of machine learning classifiers like NB and SVM and deep learning models such as ARBERT, MARBERT and XLM-R.

READ FULL TEXT
research
09/01/2023

Detecting Suicidality in Arabic Tweets Using Machine Learning and Deep Learning Techniques

Social media platforms have revolutionized traditional communication tec...
research
04/03/2023

Detection of Homophobia Transphobia in Dravidian Languages: Exploring Deep Learning Methods

The increase in abusive content on online social media platforms is impa...
research
05/12/2022

Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

Offensive content moderation is vital in social media platforms to suppo...
research
10/07/2022

Hate Speech and Offensive Language Detection in Bengali

Social media often serves as a breeding ground for various hateful and o...
research
03/18/2022

Offensive Language Detection in Under-resourced Algerian Dialectal Arabic Language

This paper addresses the problem of detecting the offensive and abusive ...
research
03/31/2022

Bangla hate speech detection on social media using attention-based recurrent neural network

Hate speech has spread more rapidly through the daily use of technology ...
research
03/07/2021

MTLHealth: A Deep Learning System for Detecting Disturbing Content in Student Essays

Essay submissions to standardized tests like the ACT occasionally includ...

Please sign up or login with your details

Forgot password? Click here to reset