In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

12/20/2022
by   Yukun Huang, et al.
0

Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2023

Few-shot Multimodal Multitask Multilingual Learning

While few-shot learning as a transfer learning paradigm has gained signi...
research
05/30/2022

Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models

Pre-trained masked language models successfully perform few-shot learnin...
research
05/06/2022

KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering

Extractive Question Answering (EQA) is one of the most important tasks i...
research
04/12/2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit ...
research
10/25/2021

Distributionally Robust Recurrent Decoders with Random Network Distillation

Neural machine learning models can successfully model language that is s...
research
05/24/2021

True Few-Shot Learning with Language Models

Pretrained language models (LMs) perform well on many tasks even when le...
research
04/17/2023

Learning to Compress Prompts with Gist Tokens

Prompting is now the primary way to utilize the multitask capabilities o...

Please sign up or login with your details

Forgot password? Click here to reset