Deep Architectures for Neural Machine Translation

It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2018

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation

Past years have witnessed rapid developments in Neural Machine Translati...
research
10/06/2020

Efficient Inference For Neural Machine Translation

Large Transformer models have achieved state-of-the-art results in neura...
research
10/29/2018

Parallel Attention Mechanisms in Neural Machine Translation

Recent papers in neural machine translation have proposed the strict use...
research
05/10/2018

Deep Neural Machine Translation with Weakly-Recurrent Units

Recurrent neural networks (RNNs) have represented for years the state of...
research
08/22/2018

Training Deeper Neural Machine Translation Models with Transparent Attention

While current state-of-the-art NMT models, such as RNN seq2seq and Trans...
research
08/30/2018

Multi-Source Syntactic Neural Machine Translation

We introduce a novel multi-source technique for incorporating source syn...
research
10/16/2020

Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation

The standard neural machine translation model can only decode with the s...

Please sign up or login with your details

Forgot password? Click here to reset