On Romanization for Model Transfer Between Scripts in Neural Machine Translation

09/30/2020
by   Chantal Amrhein, et al.
0

Transfer learning is a popular strategy to improve the quality of low-resource machine translation. For an optimal transfer of the embedding layer, the child and parent model should share a substantial part of the vocabulary. This is not the case when transferring to languages with a different script. We explore the benefit of romanization in this scenario. Our results show that romanization entails information loss and is thus not always superior to simpler vocabulary transfer methods, but can improve the transfer between related languages with different scripts. We compare two romanization tools and find that they exhibit different degrees of information loss, which affects translation quality. Finally, we extend romanization to the target side, showing that this can be a successful strategy when coupled with a simple deromanization model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2021

Machine Translation of Low-Resource Indo-European Languages

Transfer learning has been an important technique for low-resource neura...
research
01/18/2022

Extending the Vocabulary of Fictional Languages using Neural Networks

Fictional languages have become increasingly popular over the recent yea...
research
09/24/2019

Transfer Learning across Languages from Someone Else's NMT Model

Neural machine translation is demanding in terms of training time, hardw...
research
09/14/2019

A Universal Parent Model for Low-Resource Neural Machine Translation Transfer

Transfer learning from a high-resource language pair `parent' has been p...
research
05/17/2021

Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation

Quality Estimation (QE) of Machine Translation (MT) is a task to estimat...
research
01/06/2020

Exploring Benefits of Transfer Learning in Neural Machine Translation

Neural machine translation is known to require large numbers of parallel...
research
05/09/2022

Sub-Word Alignment Is Still Useful: A Vest-Pocket Method for Enhancing Low-Resource Machine Translation

We leverage embedding duplication between aligned sub-words to extend th...

Please sign up or login with your details

Forgot password? Click here to reset