Accelerating LLM Inference with Staged Speculative Decoding

08/08/2023
by   Benjamin Spector, et al.
0

Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. First, we restructure the speculative batch as a tree, which reduces generation costs and increases the expected tokens per batch. Second, we add a second stage of speculative decoding. Taken together, we reduce single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model while perfectly preserving output quality.

READ FULL TEXT
research
01/19/2023

Batch Prompting: Efficient Inference with Large Language Model APIs

Performing inference on hundreds of thousands of samples with large lang...
research
02/02/2023

Accelerating Large Language Model Decoding with Speculative Sampling

We present speculative sampling, an algorithm for accelerating transform...
research
02/10/2017

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Batch Normalization is quite effective at accelerating and improving the...
research
09/15/2023

Stack-and-Delay: a new codebook pattern for music generation

In language modeling based music generation, a generated waveform is rep...
research
06/24/2023

H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Large Language Models (LLMs), despite their recent impressive accomplish...
research
11/30/2020

Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators

DL inference queries play an important role in diverse internet services...
research
07/28/2023

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

This work aims at decreasing the end-to-end generation latency of large ...

Please sign up or login with your details

Forgot password? Click here to reset