Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

10/25/2021
by   Anna Glazkova, et al.
0

This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99 For the Marathi tasks, we propose a system based on the Language-Agnostic BERT Sentence Embedding (LaBSE). This model achieved the second result in Marathi Subtask A obtaining an F1 of 88.08

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

BERT based Transformers lead the way in Extraction of Health Information from Social Media

This paper describes our submissions for the Social Media Mining for Hea...
research
02/24/2021

Hopeful_Men@LT-EDI-EACL2021: Hope Speech Detection Using Indic Transliteration and Transformers

This paper aims to describe the approach we used to detect hope speech i...
research
11/18/2022

Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi

The widespread of offensive content online has become a reason for great...
research
08/14/2020

Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model

Disparate biases associated with datasets and trained classifiers in hat...
research
05/09/2020

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Training on only perfect Standard English corpora predisposes pre-traine...
research
08/06/2020

Studying Politeness across Cultures Using English Twitter and Mandarin Weibo

Modeling politeness across cultures helps to improve intercultural commu...
research
02/05/2022

Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss

The number of increased social media users has led to a lot of people mi...

Please sign up or login with your details

Forgot password? Click here to reset