Multilingual Neural Machine Translation with Language Clustering

08/25/2019
by   Xu Tan, et al.
0

Multilingual neural machine translation (NMT), which translates multiple languages using a single model, is of great practical importance due to its advantages in simplifying the training process, reducing online maintenance costs, and enhancing low-resource and zero-shot translation. Given there are thousands of languages in the world and some of them are very different, it is extremely burdensome to handle them all in a single model or use a separate model for each language pair. Therefore, given a fixed resource budget, e.g., the number of models, how to determine which languages should be supported by one model is critical to multilingual NMT, which, unfortunately, has been ignored by previous work. In this work, we develop a framework that clusters languages into different groups and trains one multilingual model for each cluster. We study two methods for language clustering: (1) using prior knowledge, where we cluster languages according to language family, and (2) using language embedding, in which we represent each language by an embedding vector and cluster them in the embedding space. In particular, we obtain the embedding vectors of all the languages by training a universal neural machine translation model. Our experiments on 23 languages show that the first clustering method is simple and easy to understand but leading to suboptimal translation accuracy, while the second method sufficiently captures the relationship among languages well and improves the translation accuracy for almost all the languages over baseline methods

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2019

A Study of Multilingual Neural Machine Translation

Multilingual neural machine translation (NMT) has recently been investig...
research
10/15/2021

Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?

Multilingual Neural Machine Translation (MNMT) trains a single NMT model...
research
07/11/2019

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

We introduce our efforts towards building a universal neural machine tra...
research
02/09/2019

Multilingual Neural Machine Translation With Soft Decoupled Encoding

Multilingual training of neural machine translation (NMT) systems has le...
research
04/18/2021

Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry

A popular natural language processing task decades ago, word alignment h...
research
08/14/2022

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Multilingual Neural Machine Translation has been showing great success u...
research
07/14/2021

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

Multilingual neural machine translation with a single model has drawn mu...

Please sign up or login with your details

Forgot password? Click here to reset