Unsupervised Transfer Learning in Multilingual Neural Machine Translation with Cross-Lingual Word Embeddings

03/11/2021
by   Carlos Mullov, et al.
0

In this work we look into adding a new language to a multilingual NMT system in an unsupervised fashion. Under the utilization of pre-trained cross-lingual word embeddings we seek to exploit a language independent multilingual sentence representation to easily generalize to a new language. While using cross-lingual embeddings for word lookup we decode from a yet entirely unseen source language in a process we call blind decoding. Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English. In an attempt to train the mapping from the encoder sentence representation to a new target language we use our model as an autoencoder. Merely training to translate from Portuguese to Portuguese while freezing the encoder we achieve 26 BLEU on English-Portuguese, and up to 28 BLEU when adding artificial noise to the input. Lastly we explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations through blind decoding. This yields us up to 34.6 BLEU on English-Portuguese, attaining near parity with a model adapted on real bilingual data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

Cross-lingual Word Embeddings beyond Zero-shot Machine Translation

We explore the transferability of a multilingual neural machine translat...
research
10/20/2020

What makes multilingual BERT multilingual?

Recently, multilingual BERT works remarkably well on cross-lingual trans...
research
06/16/2020

Cross-lingual Retrieval for Iterative Self-Supervised Training

Recent studies have demonstrated the cross-lingual alignment ability of ...
research
05/14/2019

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Transfer learning or multilingual model is essential for low-resource ne...
research
06/09/2021

Crosslingual Embeddings are Essential in UNMT for Distant Languages: An English to IndoAryan Case Study

Recent advances in Unsupervised Neural Machine Translation (UNMT) have m...
research
02/03/2021

Bootstrapping Multilingual AMR with Contextual Word Alignments

We develop high performance multilingualAbstract Meaning Representation ...
research
09/09/2021

Subword Mapping and Anchoring across Languages

State-of-the-art multilingual systems rely on shared vocabularies that s...

Please sign up or login with your details

Forgot password? Click here to reset