Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

09/01/2018
by   Devendra Singh Sachan, et al.
0

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2021

Hierarchical Transformer for Multilingual Machine Translation

The choice of parameter sharing strategy in multilingual machine transla...
research
05/19/2021

Learning Language Specific Sub-network for Multilingual Machine Translation

Multilingual neural machine translation aims at learning a single transl...
research
01/01/2021

Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers

The advent of the Transformer can arguably be described as a driving for...
research
08/27/2018

Parameter sharing between dependency parsers for related languages

Previous work has suggested that parameter sharing between transition-ba...
research
06/15/2023

Understanding Parameter Sharing in Transformers

Parameter sharing has proven to be a parameter-efficient approach. Previ...
research
12/27/2021

Parameter Differentiation based Multilingual Neural Machine Translation

Multilingual neural machine translation (MNMT) aims to translate multipl...
research
06/21/2021

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Multi-head attention has each of the attention heads collect salient inf...

Please sign up or login with your details

Forgot password? Click here to reset