Multi-Dialect Arabic BERT for Country-Level Dialect Identification

07/10/2020
by   Bashar Talafha, et al.
0

Arabic dialect identification is a complex problem for a number of inherent properties of the language itself. In this paper, we present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI, along the way to achieving our winning solution to subtask 1 of the Nuanced Arabic Dialect Identification (NADI) shared task. The dialect identification subtask provides 21,000 country-level labeled tweets covering all 21 Arab countries. An unlabeled corpus of 10M tweets from the same domain is also presented by the competition organizers for optional use. Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26.78 at hand. We publicly release the pre-trained language model component of our winning solution under the name of Multi-dialect-Arabic-BERT model, for any interested researcher out there.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2020

KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT mode...
research
05/13/2020

Arabic Dialect Identification in the Wild

We present QADI, an automatically collected dataset of tweets belonging ...
research
11/13/2020

Arabic Dialect Identification Using BERT-Based Domain Adaptation

Arabic is one of the most important and growing languages in the world. ...
research
06/23/2021

BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification

Dialect and standard language identification are crucial tasks for many ...
research
04/24/2020

The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic

This paper describes our method for the task of Semantic Question Simila...
research
10/31/2019

DiaNet: BERT and Hierarchical Attention Multi-Task Learning of Fine-Grained Dialect

Prediction of language varieties and dialects is an important language p...
research
04/25/2016

Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

In contrast to much previous work that has focused on location classific...

Please sign up or login with your details

Forgot password? Click here to reset