Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

05/30/2023
by   Yingcong Li, et al.
0

Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we reveal that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating an additional layer that performs the necessary filtering for CoT via the attention mechanism. In addition to these test-time benefits, we highlight how CoT accelerates pretraining by learning shortcuts to represent complex functions and how filtering plays an important role in pretraining. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Scaling large language models (LLMs) leads to an emergent capacity to le...
research
04/06/2023

When do you need Chain-of-Thought Prompting for ChatGPT?

Chain-of-Thought (CoT) prompting can effectively elicit complex multi-st...
research
10/23/2022

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models

How to usefully encode compositional task structure has long been a core...
research
06/08/2023

In-Context Learning through the Bayesian Prism

In-context learning is one of the surprising and useful features of larg...
research
03/14/2023

The Learnability of In-Context Learning

In-context learning is a surprising and important phenomenon that emerge...
research
06/09/2022

Unveiling Transformers with LEGO: a synthetic reasoning task

We propose a synthetic task, LEGO (Learning Equality and Group Operation...
research
01/04/2023

Iterated Decomposition: Improving Science Q A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, w...

Please sign up or login with your details

Forgot password? Click here to reset