Automatic Standardization of Arabic Dialects for Machine Translation

01/09/2023
by   Abidrabbo Alnassan, et al.
0

Based on an annotated multimedia corpus, television series Marāyā 2013, we dig into the question of ”automatic standardization” of Arabic dialects for machine translation. Here we distinguish between rule-based machine translation and statistical machine translation. Machine translation from Arabic most of the time takes standard or modern Arabic as the source language and produces quite satisfactory translations thanks to the availability of the translation memories necessary for training the models. The case is different for the translation of Arabic dialects. The productions are much less efficient. In our research we try to apply machine translation methods to a dialect/standard (or modern) Arabic pair to automatically produce a standard Arabic text from a dialect input, a process we call ”automatic standardization”. we opt here for the application of ”statistical models” because ”automatic standardization” based on rules is more hard with the lack of ”diglossic” dictionaries on the one hand and the difficulty of creating linguistic rules for each dialect on the other. Carrying out this research could then lead to combining ”automatic standardization” software and automatic translation software so that we take the output of the first software and introduce it as input into the second one to obtain at the end a quality machine translation. This approach may also have educational applications such as the development of applications to help understand different Arabic dialects by transforming dialectal texts into standard Arabic.

READ FULL TEXT
research
09/07/2015

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

Statistical machine translation for dialectal Arabic is characterized by...
research
06/08/2016

First Result on Arabic Neural Machine Translation

Neural machine translation has become a major alternative to widely used...
research
07/14/2019

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

A common bottleneck for developing machine translation (MT) systems for ...
research
05/27/2022

TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation

We present TURJUMAN, a neural toolkit for translating from 20 languages ...
research
08/06/2023

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Large language models (LLMs) finetuned to follow human instructions have...
research
11/16/2019

Contribution au Niveau de l'Approche Indirecte à Base de Transfert dans la Traduction Automatique

In this thesis, we address several important issues concerning the morph...
research
11/08/2019

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

In this work, we present several deep learning models for the automatic ...

Please sign up or login with your details

Forgot password? Click here to reset