Domain Curricula for Code-Switched MT at MixMT 2022

10/31/2022
by   Lekan Raheem, et al.
0

In multilingual colloquial settings, it is a habitual occurrence to compose expressions of text or speech containing tokens or phrases of different languages, a phenomenon popularly known as code-switching or code-mixing (CMX). We present our approach and results for the Code-mixed Machine Translation (MixMT) shared task at WMT 2022: the task consists of two subtasks, monolingual to code-mixed machine translation (Subtask-1) and code-mixed to monolingual machine translation (Subtask-2). Most non-synthetic code-mixed data are from social media but gathering a significant amount of this kind of data would be laborious and this form of data has more writing variation than other domains, so for both subtasks, we experimented with data schedules for out-of-domain data. We jointly learn multiple domains of text by pretraining and fine-tuning, combined with a sentence alignment objective. We found that switching between domains caused improved performance in the domains seen earliest during training, but depleted the performance on the remaining domains. A continuous training run with strategically dispensed data of different domains showed a significantly improved performance over fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Code-mixed machine translation has become an important task in multiling...
research
05/06/2018

Multi-Domain Neural Machine Translation

We present an approach to neural machine translation (NMT) that supports...
research
05/11/2021

Can You Traducir This? Machine Translation for Code-Switched Input

Code-Switching (CSW) is a common phenomenon that occurs in multilingual ...
research
09/10/2023

The Effect of Alignment Objectives on Code-Switching Translation

One of the things that need to change when it comes to machine translati...
research
04/27/2023

UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment classification in low-resource languages

Our contribution to the 2023 AfriSenti-SemEval shared task 12: Sentiment...
research
11/09/2019

Code-Mixed to Monolingual Translation Framework

The use of multilingualism in the new generation is widespread in the fo...
research
05/18/2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...

Please sign up or login with your details

Forgot password? Click here to reset