DeepAI
Log In Sign Up

Hierarchical Transformer for Multilingual Machine Translation

03/05/2021
by   Albina Khusainova, et al.
10

The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The main idea is to use these expert language hierarchies as a basis for multilingual architecture: the closer two languages are, the more parameters they share. In this work, we test this idea using the Transformer architecture and show that despite the success in previous work there are problems inherent to training such hierarchical models. We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/01/2018

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

In multilingual neural machine translation, it has been shown that shari...
05/12/2020

A Framework for Hierarchical Multilingual Machine Translation

Multilingual machine translation has recently been in vogue given its po...
12/31/2020

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Multilingual machine translation enables a single model to translate bet...
06/08/2018

Multilingual Neural Machine Translation with Task-Specific Attention

Multilingual machine translation addresses the task of translating betwe...
01/01/2021

Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

The advent of the Transformer can arguably be described as a driving for...
05/22/2022

Multilingual Machine Translation with Hyper-Adapters

Multilingual machine translation suffers from negative interference acro...
12/27/2021

Parameter Differentiation based Multilingual Neural Machine Translation

Multilingual neural machine translation (MNMT) aims to translate multipl...