Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

07/14/2022
by   Maaz Amjad, et al.
3

With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language which has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, and (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, and Taiwan), 10 teams submitted their runs for Subtask A, which is Abusive Language Detection and 9 teams submitted their runs for Subtask B, which is Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2019

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

This paper presents the results and main findings of the shared task on ...
research
12/06/2019

SemEval-2014 Task 9: Sentiment Analysis in Twitter

We describe the Sentiment Analysis in Twitter task, ran as part of SemEv...
research
10/16/2020

WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets

In this paper, we provide an overview of the WNUT-2020 shared task on th...
research
02/24/2021

SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

We present an overview of the EmotionGIF2020 Challenge, held at the 8th ...
research
05/13/2022

LSCDiscovery: A shared task on semantic change discovery and detection in Spanish

We present the first shared task on semantic change discovery and detect...
research
09/17/2019

SocialNLP EmotionX 2019 Challenge Overview: Predicting Emotions in Spoken Dialogues and Chats

We present an overview of the EmotionX 2019 Challenge, held at the 7th I...
research
05/30/2022

Rites de Passage: Elucidating Displacement to Emplacement of Refugees on Twitter

Social media deliberations allow to explore refugee-related is-sues. AI-...

Please sign up or login with your details

Forgot password? Click here to reset