In-Context Learning of Large Language Models Explained as Kernel Regression

05/22/2023
by   Chi Han, et al.
0

Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capabilities of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression algorithms when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression ŷ = ∑_i y_i K(x, x_i)/∑_i K(x, x_i) as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attentions and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights on multiple phenomena observed in ICL field: why retrieving demonstrative samples similar to test sample can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples. We will make our code available to the research community following publication.

READ FULL TEXT

page 6

page 7

page 14

page 16

page 18

page 20

page 21

page 22

research
11/03/2021

An Explanation of In-context Learning as Implicit Bayesian Inference

Large pretrained language models such as GPT-3 have the surprising abili...
research
05/24/2023

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

We investigate the predictability of large language model (LLM) capabili...
research
01/27/2023

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning

In recent years, pre-trained large language models have demonstrated rem...
research
03/14/2023

The Learnability of In-Context Learning

In-context learning is a surprising and important phenomenon that emerge...
research
12/18/2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Language models have been shown to perform better with an increase in sc...
research
12/19/2022

Training Trajectories of Language Models Across Scales

Scaling up language models has led to unprecedented performance gains, b...
research
05/12/2021

Improving Code Autocompletion with Transfer Learning

Software language models have achieved promising results predicting code...

Please sign up or login with your details

Forgot password? Click here to reset