Auto-Regressive Next-Token Predictors are Universal Learners

09/13/2023
by   Eran Malach, et al.
0

Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the simple task of next-token prediction. In this work, we present a theoretical framework for studying auto-regressive next-token predictors. We demonstrate that even simple models such as linear next-token predictors, trained on Chain-of-Thought (CoT) data, can approximate any function efficiently computed by a Turing machine. We introduce a new complexity measure – length complexity – which measures the number of intermediate tokens in a CoT sequence required to approximate some target function, and analyze the interplay between length complexity and other notions of complexity. Finally, we show experimentally that simple next-token predictors, such as linear networks and shallow Multi-Layer Perceptrons (MLPs), display non-trivial performance on text generation and arithmetic tasks. Our results demonstrate that the power of language models can be attributed, to a great extent, to the auto-regressive next-token training scheme, and not necessarily to a particular choice of architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2022

Language models are better than humans at next-token prediction

Current language models are considered to have sub-human capabilities at...
research
07/07/2023

Teaching Arithmetic to Small Transformers

Large language models like GPT-4 exhibit emergent capabilities across ge...
research
05/18/2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

In this paper, we present ZeroPrompt (Figure 1-(a)) and the correspondin...
research
06/09/2017

Learning to Embed Words in Context for Syntactic Tasks

We present models for embedding words in the context of surrounding word...
research
05/17/2023

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solv...
research
08/27/2023

Towards Unified Token Learning for Vision-Language Tracking

In this paper, we present a simple, flexible and effective vision-langua...
research
10/14/2020

The EOS Decision and Length Extrapolation

Extrapolation to unseen sequence lengths is a challenge for neural gener...

Please sign up or login with your details

Forgot password? Click here to reset