Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

09/26/2020
by   Maha J. Althobaiti, et al.
0

Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. Arabic dialect identification is the first step in various natural language processing applications such as machine translation, multilingual text-to-speech synthesis, and cross-language text generation. Therefore, in the last decade, interest has increased in addressing the problem of Arabic dialect identification. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We first define the problem and its challenges. Then, the survey extensively discusses in a critical manner many aspects related to Arabic dialect identification task. So, we review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification. We also detail the features and techniques for feature representations used to train the proposed systems. Moreover, we illustrate the taxonomy of Arabic dialects studied in the literature, the various levels of text processing at which Arabic dialect identification are conducted (e.g., token, sentence, and document level), as well as the available annotated resources, including evaluation benchmark corpora. Open challenges and issues are discussed at the end of the survey.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2019

Arabic natural language processing: An overview

Arabic is recognised as the 4th most used language of the Internet. Arab...
research
04/22/2018

Automatic Language Identification in Texts: A Survey

Language identification (LI) is the problem of determining the natural l...
research
04/25/2019

Arabic Text Diacritization Using Deep Neural Networks

Diacritization of Arabic text is both an interesting and a challenging p...
research
06/17/2021

A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text

Deep learning has emerged as a new area of machine learning research. It...
research
07/10/2012

Arabic CALL system based on pedagogically indexed text

This article introduces the benefits of using computer as a tool for for...
research
08/15/2023

A User-Centered Evaluation of Spanish Text Simplification

We present an evaluation of text simplification (TS) in Spanish for a pr...
research
02/25/2017

Critical Survey of the Freely Available Arabic Corpora

The availability of corpora is a major factor in building natural langua...

Please sign up or login with your details

Forgot password? Click here to reset