The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)

10/20/2022
by   Faheem Kirefu, et al.
0

The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

Code-mixed machine translation has become an important task in multiling...
research
10/21/2022

University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages

The paper describes the University of Cape Town's submission to the cons...
research
05/18/2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

We describe models focused at the understudied problem of translating be...
research
07/14/2021

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especial...
research
06/16/2023

Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

In this paper we describe the University of Sheffield's submission to th...
research
02/19/2022

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

To date, efforts in the code-switching literature have focused for the m...
research
08/04/2021

Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text

In this shared task, we seek the participating teams to investigate the ...

Please sign up or login with your details

Forgot password? Click here to reset