UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

10/26/2020
by   Mircea-Adrian Tanase, et al.
0

Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 ...
research
10/13/2020

BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive Language Identification in Social Media

In this paper, we describe the team BRUMS entry to OffensEval 2: Multili...
research
09/05/2020

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

We introduce the strategies used by the Accenture Team for the CLEF2020 ...
research
05/15/2020

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT

This research presents our team KEIS@JUST participation at SemEval-2020 ...
research
10/05/2021

ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models

This paper describes our approach (ur-iw-hnt) for the Shared Task of Ger...
research
05/04/2023

Curating corpora with classifiers: A case study of clean energy sentiment online

Well curated, large-scale corpora of social media posts containing broad...
research
01/28/2021

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study t...

Please sign up or login with your details

Forgot password? Click here to reset