Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

10/07/2020
by   Shuohuan Wang, et al.
0

This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2020

KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT mode...
research
08/19/2020

BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a Combination of Multilingualism and Language Models

This paper describes the system submitted by our team (BabelEnconding) t...
research
03/23/2022

Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction

This paper describes the LingJing team's method to the Workshop on Compu...
research
07/10/2022

FairDistillation: Mitigating Stereotyping in Language Models

Large pre-trained language models are successfully being used in a varie...
research
09/05/2023

Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

Environmental, Social, and Governance (ESG) has been used as a metric to...

Please sign up or login with your details

Forgot password? Click here to reset