On the Sub-Layer Functionalities of Transformer Decoder

10/06/2020
by   Yilin Yang, et al.
8

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/17/2019

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation

Neural machine translation (NMT) typically adopts the encoder-decoder fr...
03/21/2020

Analyzing Word Translation of Transformer Layers

The Transformer translation model is popular for its effective paralleli...
03/05/2021

IOT: Instance-wise Layer Reordering for Transformer Structures

With sequentially stacked self-attention, (optional) encoder-decoder att...
09/16/2021

Scaling Laws for Neural Machine Translation

We present an empirical study of scaling properties of encoder-decoder T...
02/04/2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and traini...
05/11/2022

Arbitrary Shape Text Detection via Boundary Transformer

Arbitrary shape text detection is a challenging task due to its complexi...
01/03/2021

An Efficient Transformer Decoder with Compressed Sub-layers

The large attention-based encoder-decoder network (Transformer) has beco...