Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

05/28/2021
by   El Moatez Billah Nagoudi, et al.
0

Recent progress in neural machine translation (NMT) has made it possible to translate successfully between monolingual language pairs where large parallel data exist, with pre-trained models improving performance even further. Although there exists work on translating in code-mixed settings (where one of the pairs includes text from two or more languages), it is still unclear what recent success in NMT and language modeling exactly means for translating code-mixed text. We investigate one such context, namely MT from code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English. We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs). We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from scratch. We also find LMs fine-tuned on data from various Arabic dialects to help the MSAEA-EN task. Our work is in the context of the Shared Task on Machine Translation in Code-Switching. Our best model achieves 25.72 BLEU, placing us first on the official shared task evaluation for MSAEA-EN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2022

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

To date, efforts in the code-switching literature have focused for the m...
research
05/18/2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...
research
07/04/2017

Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models

Variations in writing styles are commonly used to adapt the content to a...
research
02/06/2023

Context-Gloss Augmentation for Improving Arabic Target Sense Verification

Arabic language lacks semantic datasets and sense inventories. The most ...
research
04/19/2022

PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

This paper presents a summary of the findings that we obtained based on ...
research
08/11/2022

Domain-Specific Text Generation for Machine Translation

Preservation of domain knowledge from the source to target is crucial in...
research
08/20/2020

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

Despite the widespread adoption of deep learning for machine translation...

Please sign up or login with your details

Forgot password? Click here to reset