Exploring Attention Map Reuse for Efficient Transformer Neural Networks

01/29/2023
by   Kyuhong Shim, et al.
0

Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability to model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire sequence regardless of the distance between positions. Although SA helps Transformer performs particularly well on long-range tasks, SA requires quadratic computation and memory complexity with the input sequence length. Recently, attention map reuse, which groups multiple SA layers to share one attention map, has been proposed and achieved significant speedup for speech recognition models. In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference. We compare the method with other SA compression techniques and conduct a breakdown analysis of its advantages for a long sequence. We demonstrate the effectiveness of attention map reuse by measuring the latency on both CPU and GPU platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2022

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

Phoneme recognition is a very important part of speech recognition that ...
research
10/28/2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

We explore options to use Transformer networks in neural transducer for ...
research
10/14/2020

Memformer: The Memory-Augmented Transformer

Transformer models have obtained remarkable accomplishments in various N...
research
02/25/2019

Star-Transformer

Although the fully-connected attention-based model Transformer has achie...
research
10/14/2022

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Transformer has achieved remarkable success in language, image, and spee...
research
07/06/2022

Astroconformer: Inferring Surface Gravity of Stars from Stellar Light Curves with Transformer

We introduce Astroconformer, a Transformer-based model to analyze stella...
research
03/15/2022

Efficient Long Sequence Encoding via Synchronization

Pre-trained Transformer models have achieved successes in a wide range o...

Please sign up or login with your details

Forgot password? Click here to reset