Fast Transformer Decoding: One Write-Head is All You Need

11/06/2019
by   Noam Shazeer, et al.
0

Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences. While training these layers is generally fast and simple, due to parallelizability across the length of the sequence, incremental inference (where such paralleization is impossible) is often slow, due to the memory-bandwidth cost of repeatedly loading the large "keys" and "values" tensors. We propose a variant called multi-query attention, where the keys and values are shared across all of the different attention "heads", greatly reducing the size of these tensors and hence the memory bandwidth requirements of incremental decoding. We verify experimentally that the resulting models can indeed be much faster to decode, and incur only minor quality degradation from the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

EL-Attention: Memory Efficient Lossless Attention for Generation

Transformer model with multi-head attention requires caching intermediat...
research
05/22/2023

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Multi-query attention (MQA), which only uses a single key-value head, dr...
research
10/16/2021

Transformer with a Mixture of Gaussian Keys

Multi-head attention is a driving force behind state-of-the-art transfor...
research
12/20/2022

EIT: Enhanced Interactive Transformer

In this paper, we propose a novel architecture, the Enhanced Interactive...
research
07/10/2019

Large Memory Layers with Product Keys

This paper introduces a structured memory which can be easily integrated...
research
06/26/2019

Sharing Attention Weights for Fast Transformer

Recently, the Transformer machine translation system has shown strong re...
research
03/22/2020

A Better Variant of Self-Critical Sequence Training

In this work, we present a simple yet better variant of Self-Critical Se...

Please sign up or login with your details

Forgot password? Click here to reset