UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models

10/26/2020
by   Mircea-Adrian Tanase, et al.
0

Offensive language detection is one of the most challenging problem in the natural language processing field, being imposed by the rising presence of this phenomenon in online social media. This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages (i.e., English, Arabic, Danish, Greek, and Turkish), which was employed in Subtask A of the Offenseval 2020 shared task. Several neural architectures (i.e., BERT, mBERT, Roberta, XLM-Roberta, and ALBERT), pre-trained using both single-language and multilingual corpora, were fine-tuned and compared using multiple combinations of datasets. Finally, the highest-scoring models were used for our submissions in the competition, which ranked our team 21st of 85, 28th of 53, 19th of 39, 16th of 37, and 10th of 46 for English, Arabic, Danish, Greek, and Turkish, respectively.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/07/2020

LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification

This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 ...
12/27/2020

ARBERT MARBERT: Deep Bidirectional Transformers for Arabic

Masked language models (MLM) have become an integral part of many natura...
09/05/2020

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

We introduce the strategies used by the Accenture Team for the CLEF2020 ...
05/15/2020

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT

This research presents our team KEIS@JUST participation at SemEval-2020 ...
10/10/2021

amsqr at SemEval-2020 Task 12: Offensive language detection using neural networks and anti-adversarial features

This paper describes a method and system to solve the problem of detecti...
01/28/2021

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study t...
10/05/2021

ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models

This paper describes our approach (ur-iw-hnt) for the Shared Task of Ger...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.