Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network

09/05/2019
by   Chengyi Wang, et al.
0

Due to the highly parallelizable architecture, Transformer is faster to train than RNN-based models and popularly used in machine translation tasks. However, at inference time, each output word requires all the hidden states of the previously generated words, which limits the parallelization capability, and makes it much slower than RNN-based ones. In this paper, we systematically analyze the time cost of different components of both the Transformer and RNN-based model. Based on it, we propose a hybrid network of self-attention and RNN structures, in which, the highly parallelizable self-attention is utilized as the encoder, and the simpler RNN structure is used as the decoder. Our hybrid network can decode 4-times faster than the Transformer. In addition, with the help of knowledge distillation, our hybrid network achieves comparable translation quality to the original Transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2018

Accelerating Neural Transformer via an Average Attention Network

With parallelizable attention networks, the neural Transformer is very f...
research
09/24/2022

A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion

Cyrillic and Traditional Mongolian are the two main members of the Mongo...
research
09/19/2018

Close to Human Quality TTS with Transformer

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...
research
06/03/2019

Assessing the Ability of Self-Attention Networks to Learn Word Order

Self-attention networks (SAN) have attracted a lot of interests due to t...
research
12/07/2021

Hybrid Self-Attention NEAT: A novel evolutionary approach to improve the NEAT algorithm

This article presents a "Hybrid Self-Attention NEAT" method to improve t...
research
06/09/2019

Attention-based Conditioning Methods for External Knowledge Integration

In this paper, we present a novel approach for incorporating external kn...
research
08/18/2021

SHAQ: Single Headed Attention with Quasi-Recurrence

Natural Language Processing research has recently been dominated by larg...

Please sign up or login with your details

Forgot password? Click here to reset