OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media

09/21/2023
by   Fatimah Alzamzami, et al.
0

While resources for English language are fairly sufficient to understand content on social media, similar resources in Arabic are still immature. The main reason that the resources in Arabic are insufficient is that Arabic has many dialects in addition to the standard version (MSA). Arabs do not use MSA in their daily communications; rather, they use dialectal versions. Unfortunately, social users transfer this phenomenon into their use of social media platforms, which in turn has raised an urgent need for building suitable AI models for language-dependent applications. Existing machine translation (MT) systems designed for MSA fail to work well with Arabic dialects. In light of this, it is necessary to adapt to the informal nature of communication on social networks by developing MT systems that can effectively handle the various dialects of Arabic. Unlike for MSA that shows advanced progress in MT systems, little effort has been exerted to utilize Arabic dialects for MT systems. While few attempts have been made to build translation datasets for dialectal Arabic, they are domain dependent and are not OSN cultural-language friendly. In this work, we attempt to alleviate these limitations by proposing an online social network-based multidialect Arabic dataset that is crafted by contextually translating English tweets into four Arabic dialects: Gulf, Yemeni, Iraqi, and Levantine. To perform the translation, we followed our proposed guideline framework for content translation, which could be universally applicable for translation between foreign languages and local dialects. We validated the authenticity of our proposed dataset by developing neural MT models for four Arabic dialects. Our results have shown a superior performance of our NMT models trained using our dataset. We believe that our dataset can reliably serve as an Arabic multidialectal translation dataset for informal MT tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2018

Système de traduction automatique statistique Anglais-Arabe

Machine translation (MT) is the process of translating text written in a...
research
11/08/2019

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

In this work, we present several deep learning models for the automatic ...
research
03/17/2022

Towards Responsible Natural Language Annotation for the Varieties of Arabic

When building NLP models, there is a tendency to aim for broader coverag...
research
10/21/2022

A Semi-supervised Approach for a Better Translation of Sentiment in Dialectical Arabic UGT

In the online world, Machine Translation (MT) systems are extensively us...
research
10/03/2016

An Arabic-Hebrew parallel corpus of TED talks

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT...
research
11/20/2017

#Halal Culture on Instagram

Halal is a notion that applies to both objects and actions, and means pe...
research
12/01/2020

Extracting Synonyms from Bilingual Dictionaries

We present our progress in developing a novel algorithm to extract synon...

Please sign up or login with your details

Forgot password? Click here to reset