Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi

11/18/2022
by   Tharindu Ranasinghe, et al.
0

The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically. With the goal of carrying out a fair evaluation of these systems, several international competitions have been organized, providing the community with important benchmark data and evaluation methods for various languages. Organized since 2019, the HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives. In its fourth iteration, HASOC 2022 included three subtracks for English, Hindi, and Marathi. In this paper, we report the results of the HASOC 2022 Marathi subtrack which provided participants with a dataset containing data from Twitter manually annotated using the popular OLID taxonomy. The Marathi track featured three additional subtracks, each corresponding to one level of the taxonomy: Task A - offensive content identification (offensive vs. non-offensive); Task B - categorization of offensive types (targeted vs. untargeted), and Task C - offensive target identification (individual vs. group vs. others). Overall, 59 runs were submitted by 10 teams. The best systems obtained an F1 of 0.9745 for Subtrack 3A, an F1 of 0.9207 for Subtrack 3B, and F1 of 0.9607 for Subtrack 3C. The best performing algorithms were a mixture of traditional and deep learning approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2019

Experiments in Cuneiform Language Identification

This paper presents methods to discriminate between languages and dialec...
research
10/25/2021

Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

This paper describes neural models developed for the Hate Speech and Off...
research
05/12/2021

Multilingual Offensive Language Identification for Low-resource Languages

Offensive content is pervasive in social media and a reason for concern ...
research
06/12/2020

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

We present the results and main findings of SemEval-2020 Task 12 on Mult...
research
10/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Offensive content is pervasive in social media and a reason for concern ...
research
07/12/2023

ACTI at EVALITA 2023: Overview of the Conspiracy Theory Identification Task

Conspiracy Theory Identication task is a new shared task proposed for th...
research
04/29/2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

The use of offensive language is a major problem in social media which h...

Please sign up or login with your details

Forgot password? Click here to reset