A Theory for Emergence of Complex Skills in Language Models

07/29/2023
by   Sanjeev Arora, et al.
0

A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this slingshot generalization since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving k-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

ALERT: Adapting Language Models to Reasoning Tasks

Current large language models can perform reasonably well on complex tas...
research
08/01/2023

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

We consider the problem of eliciting compositional generalization capabi...
research
10/28/2020

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four do...
research
07/26/2023

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

The quality of training data impacts the performance of pre-trained larg...
research
02/02/2022

Unified Scaling Laws for Routed Language Models

The performance of a language model has been shown to be effectively mod...
research
01/10/2023

Scaling Laws for Generative Mixed-Modal Language Models

Generative language models define distributions over sequences of tokens...
research
05/25/2012

Foreword: A Computable Universe, Understanding Computation and Exploring Nature As Computation

I am most honoured to have the privilege to present the Foreword to this...

Please sign up or login with your details

Forgot password? Click here to reset