DeepAI AI Chat
Log In Sign Up

The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)

by   Faheem Kirefu, et al.

The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.


page 1

page 2

page 3

page 4


Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Code-mixed machine translation has become an important task in multiling...

University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages

The paper describes the University of Cape Town's submission to the cons...

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especial...

Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text

In this shared task, we seek the participating teams to investigate the ...

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

In this paper, we propose efficient and less resource-intensive strategi...

Domain Curricula for Code-Switched MT at MixMT 2022

In multilingual colloquial settings, it is a habitual occurrence to comp...