Accelerating Large Language Model Decoding with Speculative Sampling

02/02/2023
by   Charlie Chen, et al.
0

We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of short continuations, generated by a faster but less powerful draft model, is comparable to that of sampling a single token from the larger target model. This is combined with a novel modified rejection sampling scheme which preserves the distribution of the target model within hardware numerics. We benchmark speculative sampling with Chinchilla, a 70 billion parameter language model, achieving a 2-2.5x decoding speedup in a distributed setup, without compromising the sample quality or making modifications to the model itself.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Existing large language models have to run K times to generate a sequenc...
research
03/09/2023

Planning with Large Language Models for Code Generation

Existing large language model-based code generation pipelines typically ...
research
08/08/2023

Accelerating LLM Inference with Staged Speculative Decoding

Recent advances with large language models (LLM) illustrate their divers...
research
01/04/2020

Transformer-based language modeling and decoding for conversational speech recognition

We propose a way to use a transformer-based language model in conversati...
research
09/14/2023

Masked Generative Modeling with Enhanced Sampling Scheme

This paper presents a novel sampling scheme for masked non-autoregressiv...
research
08/18/2023

Exploring Sampling Techniques for Generating Melodies with a Transformer Language Model

Research in natural language processing has demonstrated that the qualit...
research
09/06/2023

Improving Code Generation by Dynamic Temperature Sampling

Recently, Large Language Models (LLMs) have shown impressive results in ...

Please sign up or login with your details

Forgot password? Click here to reset