Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

05/28/2021
by   El Moatez Billah Nagoudi, et al.
0

Recent progress in neural machine translation (NMT) has made it possible to translate successfully between monolingual language pairs where large parallel data exist, with pre-trained models improving performance even further. Although there exists work on translating in code-mixed settings (where one of the pairs includes text from two or more languages), it is still unclear what recent success in NMT and language modeling exactly means for translating code-mixed text. We investigate one such context, namely MT from code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English. We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs). We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from scratch. We also find LMs fine-tuned on data from various Arabic dialects to help the MSAEA-EN task. Our work is in the context of the Shared Task on Machine Translation in Code-Switching. Our best model achieves 25.72 BLEU, placing us first on the official shared task evaluation for MSAEA-EN.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/07/2021

Improving Similar Language Translation With Transfer Learning

We investigate transfer learning based on pre-trained neural machine tra...
05/18/2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...
07/04/2017

Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models

Variations in writing styles are commonly used to adapt the content to a...
08/20/2020

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

Despite the widespread adoption of deep learning for machine translation...
12/01/2020

Extracting Synonyms from Bilingual Dictionaries

We present our progress in developing a novel algorithm to extract synon...
09/09/2021

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such...
10/26/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Since the advent of Neural Machine Translation (NMT) approaches there ha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.