Blockwise Parallel Decoding for Deep Autoregressive Models

11/07/2018
by   Mitchell Stern, et al.
0

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make different trade-offs between the amount of computation needed per layer and the length of the critical path at training time, generation still remains an inherently sequential process. To overcome this limitation, we propose a novel blockwise parallel decoding scheme in which we make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model. This allows for substantial theoretical improvements in generation speed when applied to architectures that can process output sequences in parallel. We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance. In terms of wall-clock time, our fastest models exhibit real-time speedups of up to 4x over standard greedy decoding.

READ FULL TEXT
research
02/15/2018

Image Tranformer

Image generation has been successfully cast as an autoregressive sequenc...
research
03/22/2020

A Better Variant of Self-Critical Sequence Training

In this work, we present a simple yet better variant of Self-Critical Se...
research
02/15/2018

Image Transformer

Image generation has been successfully cast as an autoregressive sequenc...
research
05/20/2022

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

We study lossless acceleration for seq2seq generation with a novel decod...
research
06/13/2018

Double Path Networks for Sequence to Sequence Learning

Encoder-decoder based Sequence to Sequence learning (S2S) has made remar...
research
05/17/2023

Accelerating Transformer Inference for Translation via Parallel Decoding

Autoregressive decoding limits the efficiency of transformers for Machin...
research
04/02/2020

Consistent Multiple Sequence Decoding

Sequence decoding is one of the core components of most visual-lingual m...

Please sign up or login with your details

Forgot password? Click here to reset