Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

05/23/2023
by   Lean Wang, et al.
0

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided context remains under-explored. In this paper, we investigate the working mechanism of ICL through an information flow lens. Our findings reveal that label words in the demonstration examples function as anchors: (1) semantic information aggregates into label word representations during the shallow computation layers' processing; (2) the consolidated information in label words serves as a reference for LLMs' final predictions. Based on these insights, we introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL. The promising applications of our findings again validate the uncovered ICL working mechanism and pave the way for future studies.

READ FULL TEXT

page 8

page 13

research
07/23/2023

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

The performance of Large Language Models (LLMs) on downstream tasks ofte...
research
12/13/2022

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

Large language models have exhibited intriguing in-context learning capa...
research
05/24/2023

Adversarial Demonstration Attacks on Large Language Models

With the emergence of more powerful large language models (LLMs), such a...
research
12/20/2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Large pretrained language models have shown surprising In-Context Learni...
research
07/28/2023

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

Large language models (LLMs) have shown remarkable capacity for in-conte...
research
04/15/2023

Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study

Pre-trained models of code have gained widespread popularity in many cod...
research
05/26/2023

Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning

Large language models (LLMs) have recently shown great potential for in-...

Please sign up or login with your details

Forgot password? Click here to reset