Understanding In-Context Learning via Supportive Pretraining Data

06/26/2023
by   Xiaochuang Han, et al.
0

In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time. It is not well understood why ICL ability emerges, as the model has never been specifically trained on such demonstrations. Unlike prior work that explores implicit mechanisms behind ICL, we study ICL via investigating the pretraining data. Specifically, we first adapt an iterative, gradient-based approach to find a small subset of pretraining data that supports ICL. We observe that a continued pretraining on this small subset significantly improves the model's ICL ability, by up to 18 with random subsets of pretraining data and discover: (1) The supportive pretraining data to ICL do not have a higher domain relevance to downstream tasks. (2) The supportive pretraining data have a higher mass of rarely occurring, long-tail tokens. (3) The supportive pretraining data are challenging examples where the information gain from long-range context is below average, indicating learning to incorporate difficult long-range context encourages ICL. Our work takes a first step towards understanding ICL via analyzing instance-level pretraining data. Our insights have a potential to enhance the ICL ability of language models by actively guiding the construction of pretraining data in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

An Explanation of In-context Learning as Implicit Bayesian Inference

Large pretrained language models such as GPT-3 have the surprising abili...
research
05/21/2023

PRODIGY: Enabling In-context Learning Over Graphs

In-context learning is the ability of a pretrained model to adapt to nov...
research
08/18/2023

Long-range Multimodal Pretraining for Movie Understanding

Learning computer vision models from (and for) movies has a long-standin...
research
05/25/2022

ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data

Large pretrained language models have been performing increasingly well ...
research
12/14/2021

Simple Local Attentions Remain Competitive for Long-Context Tasks

Many NLP tasks require processing long contexts beyond the length limit ...
research
10/05/2020

PMI-Masking: Principled masking of correlated spans

Masking tokens uniformly at random constitutes a common flaw in the pret...
research
05/20/2023

Lifelong Language Pretraining with Distribution-Specialized Experts

Pretraining on a large-scale corpus has become a standard method to buil...

Please sign up or login with your details

Forgot password? Click here to reset