Linearizing Transformer with Key-Value Memory Bank

03/23/2022
by   Yizhe Zhang, et al.
14

Transformer has brought great success to a wide range of natural language processing tasks. Nevertheless, the computational overhead of the vanilla transformer scales quadratically with sequence length. Many efforts have been made to develop more efficient transformer variants. A line of work (e.g., Linformer) projects the input sequence into a low-rank space, achieving linear time complexity. However, Linformer does not suit well for text generation tasks as the sequence length must be pre-specified. We propose MemSizer, an approach also projects the source sequence into lower dimension representation but can take input with dynamic length, with a different perspective of the attention mechanism. MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation, which yields constant memory complexity and reduced computation at inference. We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer and other linear variants in language modeling and machine translation tasks, revealing a viable direction towards further inference efficiency improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

Finetuning Pretrained Transformers into RNNs

Transformers have outperformed recurrent neural networks (RNNs) in natur...
research
03/03/2021

Random Feature Attention

Transformers are state-of-the-art models for a variety of sequence model...
research
10/14/2020

Memformer: The Memory-Augmented Transformer

Transformer models have obtained remarkable accomplishments in various N...
research
10/06/2021

ABC: Attention with Bounded-memory Control

Transformer architectures have achieved state-of-the-art results on a va...
research
10/05/2022

Waveformer: Linear-Time Attention with Forward and Backward Wavelet Transform

We propose Waveformer that learns attention mechanism in the wavelet coe...
research
12/21/2020

Sub-Linear Memory: How to Make Performers SLiM

The Transformer architecture has revolutionized deep learning on sequent...
research
02/17/2020

Controlling Computation versus Quality for Neural Sequence Models

Most neural networks utilize the same amount of compute for every exampl...

Please sign up or login with your details

Forgot password? Click here to reset