DeepAI AI Chat
Log In Sign Up

UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

by   Mircea-Adrian Tanase, et al.

Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.


page 1

page 2

page 3

page 4


Curating corpora with classifiers: A case study of clean energy sentiment online

Well curated, large-scale corpora of social media posts containing broad...

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

We introduce the strategies used by the Accenture Team for the CLEF2020 ...

Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter

The recent advances in natural language processing have yielded many exc...

ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

Masked language models (MLM) have become an integral part of many natura...

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study t...