Cross-lingual Inductive Transfer to Detect Offensive Language

07/07/2020
by   Kartikey Pant, et al.
0

With the growing use of social media and its availability, many instances of the use of offensive language have been observed across multiple languages and domains. This phenomenon has given rise to the growing need to detect the offensive language used in social media cross-lingually. In OffensEval 2020, the organizers have released the multilingual Offensive Language Identification Dataset (mOLID), which contains tweets in five different languages, to detect offensive language. In this work, we introduce a cross-lingual inductive approach to identify the offensive language in tweets using the contextual word embedding XLM-RoBERTa (XLM-R). We show that our model performs competitively on all five languages, obtaining the fourth position in the English task with an F1-score of 0.919 and eighth position in the Turkish task with an F1-score of 0.781. Further experimentation proves that our model works competitively in a zero-shot learning environment, and is extensible to other languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Cross-Lingual Transfer Learning for Complex Word Identification

Complex Word Identification (CWI) is a task centered on detecting hard-t...
research
10/21/2020

LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text

This paper describes our contribution to the SemEval-2020 Task 9 on Sent...
research
10/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Offensive content is pervasive in social media and a reason for concern ...
research
01/09/2020

Offensive Language Detection: A Comparative Analysis

Offensive behaviour has become pervasive in the Internet community. Indi...
research
06/11/2023

EaSyGuide : ESG Issue Identification Framework leveraging Abilities of Generative Large Language Models

This paper presents our participation in the FinNLP-2023 shared task on ...
research
04/23/2020

Characterising User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political information diffusion...
research
01/28/2021

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study t...

Please sign up or login with your details

Forgot password? Click here to reset