Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions

12/01/2022
by   Kumar Shridhar, et al.
0

Step-by-step reasoning approaches like chain-of-thought (CoT) have proved to be a very effective technique to induce reasoning capabilities in large language models. However, the success of the CoT approach depends primarily on model size, and often billion parameter-scale models are needed to get CoT to work. In this paper, we propose a knowledge distillation approach, that leverages the step-by-step CoT reasoning capabilities of larger models and distils these reasoning abilities into smaller models. Our approach Decompositional Distillation learns a semantic decomposition of the original problem into a sequence of subproblems and uses it to train two models: a) a problem decomposer that learns to decompose the complex reasoning problem into a sequence of simpler sub-problems and b) a problem solver that uses the intermediate subproblems to solve the overall problem. On a multi-step math word problem dataset (GSM8K), we boost the performance of GPT-2 variants up to 35 approach, it is possible to train a GPT-2-large model (775M) that can outperform a 10X larger GPT-3 (6B) model trained using CoT reasoning. Finally, we also demonstrate that our approach of problem decomposition can also be used as an alternative to CoT prompting, which boosts the GPT-3 performance by 40 compared to CoT prompts.

READ FULL TEXT
research
06/02/2023

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Mathematical reasoning is regarded as a necessary ability for Language M...
research
08/09/2023

Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA

Large Language Models (LLMs) have shown outstanding performance across w...
research
01/30/2023

Specializing Smaller Language Models towards Multi-Step Reasoning

The surprising ability of Large Language Models (LLMs) to perform well o...
research
10/05/2022

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Few-shot prompting is a surprisingly powerful way to use Large Language ...
research
08/16/2023

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

Recent advancements in large-scale models, such as GPT-4, have showcased...
research
05/03/2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Deploying large language models (LLMs) is challenging because they are m...
research
05/23/2023

Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction

Despite constituting 65 underrepresented in generative AI research. Mean...

Please sign up or login with your details

Forgot password? Click here to reset