Learning Deep Transformer Models for Machine Translation

06/05/2019
by   Qiang Wang, et al.
0

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT'16 English- German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/23/2021

Recurrent multiple shared layers in Depth for Neural Machine Translation

Learning deeper models is usually a simple and effective approach to imp...
11/10/2019

Two-Headed Monster And Crossed Co-Attention Networks

This paper presents some preliminary investigations of a new co-attentio...
01/03/2020

Learning Accurate Integer Transformer Machine-Translation Models

We describe a method for training accurate Transformer machine-translati...
04/05/2019

Modeling Recurrence for Transformer

Recently, the Transformer model that is based solely on attention mechan...
09/23/2020

Multi-Pass Transformer for Machine Translation

In contrast with previous approaches where information flows only toward...
10/21/2020

Multi-Unit Transformers for Neural Machine Translation

Transformer models achieve remarkable success in Neural Machine Translat...
01/30/2019

The Evolved Transformer

Recent works have highlighted the strengths of the Transformer architect...