Dialect Identification in Nuanced Arabic Tweets Using Farasa Segmentation and AraBERT

02/19/2021
by   Anshul Wadhawan, et al.
0

This paper presents our approach to address the EACL WANLP-2021 Shared Task 1: Nuanced Arabic Dialect Identification (NADI). The task is aimed at developing a system that identifies the geographical location(country/province) from where an Arabic tweet in the form of modern standard Arabic or dialect comes from. We solve the task in two parts. The first part involves pre-processing the provided dataset by cleaning, adding and segmenting various parts of the text. This is followed by carrying out experiments with different versions of two Transformer based models, AraBERT and AraELECTRA. Our final approach achieved macro F1-scores of 0.216, 0.235, 0.054, and 0.043 in the four subtasks, and we were ranked second in MSA identification subtasks and fourth in DA identification subtasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2021

AraBERT and Farasa Segmentation Based Approach For Sarcasm and Sentiment Detection in Arabic Tweets

This paper presents our strategy to tackle the EACL WANLP-2021 Shared Ta...
research
03/01/2021

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI...
research
05/13/2018

UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row

We present a machine learning approach that ranked on the first place in...
research
06/23/2021

BERT-based Multi-Task Model for Country and Province Level Modern Standard Arabic and Dialectal Arabic Identification

Dialect and standard language identification are crucial tasks for many ...
research
05/13/2020

Arabic Dialect Identification in the Wild

We present QADI, an automatically collected dataset of tweets belonging ...
research
01/19/2022

Interpreting Arabic Transformer Models

Arabic is a Semitic language which is widely spoken with many dialects. ...
research
01/29/2019

An Arabic Dependency Treebank in the Travel Domain

In this paper we present a dependency treebank of travel domain sentence...

Please sign up or login with your details

Forgot password? Click here to reset