Domain Curricula for Code-Switched MT at MixMT 2022

10/31/2022

∙

In multilingual colloquial settings, it is a habitual occurrence to compose expressions of text or speech containing tokens or phrases of different languages, a phenomenon popularly known as code-switching or code-mixing (CMX). We present our approach and results for the Code-mixed Machine Translation (MixMT) shared task at WMT 2022: the task consists of two subtasks, monolingual to code-mixed machine translation (Subtask-1) and code-mixed to monolingual machine translation (Subtask-2). Most non-synthetic code-mixed data are from social media but gathering a significant amount of this kind of data would be laborious and this form of data has more writing variation than other domains, so for both subtasks, we experimented with data schedules for out-of-domain data. We jointly learn multiple domains of text by pretraining and fine-tuning, combined with a sentence alignment objective. We found that switching between domains caused improved performance in the domains seen earliest during training, but depleted the performance on the remaining domains. A continuous training run with strategically dispensed data of different domains showed a significantly improved performance over fine-tuning.

READ FULL TEXT

Domain Curricula for Code-Switched MT at MixMT 2022

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Multi-Domain Neural Machine Translation

Can You Traducir This? Machine Translation for Code-Switched Input

The Effect of Alignment Objectives on Code-Switching Translation

UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languages

Code-Mixed to Monolingual Translation Framework

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

Domain Curricula for Code-Switched MT at MixMT 2022

Related Research

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Multi-Domain Neural Machine Translation

Can You Traducir This? Machine Translation for Code-Switched Input

The Effect of Alignment Objectives on Code-Switching Translation

UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languages

Code-Mixed to Monolingual Translation Framework

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing