Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

07/28/2023
by   Xuefei Ning, et al.
0

This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose "Skeleton-of-Thought" (SoT), which guides LLMs to first generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-up (up to 2.39x across 11 different LLMs), but it can also potentially improve the answer quality on several question categories in terms of diversity and relevance. SoT is an initial attempt at data-centric optimization for efficiency, and reveal the potential of pushing LLMs to think more like a human for answer quality.

READ FULL TEXT

page 4

page 11

page 12

page 15

page 17

page 18

page 21

page 31

research
02/02/2023

Multimodal Chain-of-Thought Reasoning in Language Models

Large language models (LLMs) have shown impressive performance on comple...
research
08/24/2023

Exploring the Integration Strategies of Retriever and Large Language Models

The integration of retrieved passages and large language models (LLMs), ...
research
05/30/2023

KEYword based Sampling (KEYS) for Large Language Models

Question answering (Q/A) can be formulated as a generative task (Mitra, ...
research
08/21/2018

A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation

Narrative story generation is a challenging problem because it demands t...
research
05/12/2022

Sampling with Attribute-Related Information for Controlling Language Models

The dominant approaches for controlling language models are based on fin...
research
08/08/2023

Accelerating LLM Inference with Staged Speculative Decoding

Recent advances with large language models (LLM) illustrate their divers...
research
04/28/2015

Can Machines Truly Think

Can machines truly think? This question and its answer have many implica...

Please sign up or login with your details

Forgot password? Click here to reset