Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

05/11/2022
by   Haokun Liu, et al.
16

Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new parameter-efficient fine-tuning method called (IA)^3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6 publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Pre-trained vision transformers have strong representation benefits to v...
research
11/22/2022

HyperTuning: Toward Adapting Large Language Models without Back-propagation

Fine-tuning large language models for different tasks can be costly and ...
research
09/09/2023

FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning

Learning paradigms for large language models (LLMs) currently tend to fa...
research
09/05/2021

Teaching Autoregressive Language Models Complex Tasks By Demonstration

This paper demonstrates that by fine-tuning an autoregressive language m...
research
02/13/2023

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

Pretrained large language models (LLMs) are able to solve a wide variety...
research
02/19/2023

Few-shot Multimodal Multitask Multilingual Learning

While few-shot learning as a transfer learning paradigm has gained signi...
research
05/24/2022

AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

Fine-tuning large-scale pre-trained language models to downstream tasks ...

Please sign up or login with your details

Forgot password? Click here to reset