EL-Attention: Memory Efficient Lossless Attention for Generation

05/11/2021
by   Yu Yan, et al.
0

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memory-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, with no requirements of using cache. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Multi-query attention (MQA), which only uses a single key-value head, dr...
research
11/06/2019

Fast Transformer Decoding: One Write-Head is All You Need

Multi-head attention layers, as used in the Transformer neural sequence ...
research
11/09/2022

Efficiently Scaling Transformer Inference

We study the problem of efficient generative inference for Transformer m...
research
06/26/2019

Sharing Attention Weights for Fast Transformer

Recently, the Transformer machine translation system has shown strong re...
research
12/05/2021

Boosting Mobile CNN Inference through Semantic Memory

Human brains are known to be capable of speeding up visual recognition o...
research
12/20/2022

EIT: Enhanced Interactive Transformer

In this paper, we propose a novel architecture, the Enhanced Interactive...
research
05/25/2019

Are Sixteen Heads Really Better than One?

Attention is a powerful and ubiquitous mechanism for allowing neural mod...

Please sign up or login with your details

Forgot password? Click here to reset