UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

by   Mircea-Adrian Tanase, et al.

Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.



There are no comments yet.


page 1

page 2

page 3

page 4


LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 ...

ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

Masked language models (MLM) have become an integral part of many natura...

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

We introduce the strategies used by the Accenture Team for the CLEF2020 ...

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT

This research presents our team KEIS@JUST participation at SemEval-2020 ...

amsqr at SemEval-2020 Task 12: Offensive language detection using neural networks and anti-adversarial features

This paper describes a method and system to solve the problem of detecti...

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study t...

ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models

This paper describes our approach (ur-iw-hnt) for the Shared Task of Ger...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.