Very Deep Transformers for Neural Machine Translation

08/18/2020
by   Xiaodong Liu, et al.
0

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

READ FULL TEXT
research
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
research
02/16/2020

Neural Machine Translation with Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems ...
research
11/25/2019

Learning to Reuse Translations: Guiding Neural Machine Translation with Examples

In this paper, we study the problem of enabling neural machine translati...
research
02/16/2020

Multi-layer Representation Fusion for Neural Machine Translation

Neural machine translation systems require a number of stacked layers fo...
research
12/19/2018

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation

Past years have witnessed rapid developments in Neural Machine Translati...
research
08/22/2018

Training Deeper Neural Machine Translation Models with Transparent Attention

While current state-of-the-art NMT models, such as RNN seq2seq and Trans...
research
11/05/2019

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

We implement a Tensor Train layer in the TensorFlow Neural Machine Trans...

Please sign up or login with your details

Forgot password? Click here to reset