Specializing Smaller Language Models towards Multi-Step Reasoning

01/30/2023
by   Yao Fu, et al.
0

The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact, be distilled down from GPT-3.5 (≥ 175B) to T5 variants (≤ 11B). We propose model specialization, to specialize the model's ability towards a target task. The hypothesis is that large models (commonly viewed as larger than 100B) have strong modeling power, but are spread on a large spectrum of tasks. Small models (commonly viewed as smaller than 10B) have limited model capacity, but if we concentrate their capacity on a specific target task, the model can achieve a decent improved performance. We use multi-step math reasoning as our testbed because it is a very typical emergent ability. We show two important aspects of model abilities: (1). there exists a very complex balance/ tradeoff between language models' multi-dimensional abilities; (2). by paying the price of decreased generic ability, we can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability. We further give comprehensive discussions about important design choices for better generalization, including the tuning data format, the start model checkpoint, and a new model selection method. We hope our practice and discoveries can serve as an important attempt towards specialized smaller models in the new research paradigm set by LLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Mathematical reasoning is regarded as a necessary ability for Language M...
research
12/01/2022

Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions

Step-by-step reasoning approaches like chain-of-thought (CoT) have prove...
research
05/30/2023

The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code

Causal reasoning, the ability to identify cause-and-effect relationship,...
research
04/28/2023

Are Emergent Abilities of Large Language Models a Mirage?

Recent work claims that large language models display emergent abilities...
research
10/20/2022

Transcending Scaling Laws with 0.1

Scaling language models improves performance but comes with significant ...
research
04/30/2023

Beyond Classification: Financial Reasoning in State-of-the-Art Language Models

Large Language Models (LLMs), consisting of 100 billion or more paramete...
research
09/11/2023

Textbooks Are All You Need II: phi-1.5 technical report

We continue the investigation into the power of smaller Transformer-base...

Please sign up or login with your details

Forgot password? Click here to reset