Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

12/01/2022
by   Hamish Ivison, et al.
0

Language models trained on massive prompted multitask datasets like T0 (Sanh et al., 2021) or FLAN (Wei et al., 2021a) can generalize to tasks unseen during training. We show that training on a carefully chosen subset of instances can outperform training on all available data on a variety of datasets. We assume access to a small number (250–1000) of unlabeled target task instances, select their nearest neighbors from a pool of multitask data, and use the retrieved data to train target task-specific models. Our method is more data-efficient than training a single multitask model, while still outperforming it by large margins. We evaluate across a diverse set of tasks not in the multitask pool we retrieve from, including those used to evaluate T0 and additional complex tasks including legal and scientific document QA. We retrieve small subsets of P3 (the collection of prompted datasets from which T0's training data was sampled) and finetune T5 models that outperform the 3-billion parameter variant of T0 (T0-3B) by 3–30 the data used to train T0-3B. These models also provide a better initialization than T0-3B for few-shot finetuning on target-task data, as shown by a 2–23 relative improvement over few-shot finetuned T0-3B models on 8 datasets. Our code is available at https://github.com/allenai/data-efficient-finetuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero...
research
03/29/2022

Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

Large language models have shown that impressive zero-shot performance c...
research
11/21/2022

Multitask Vision-Language Prompt Tuning

Prompt Tuning, conditioning on task-specific learned prompt vectors, has...
research
03/25/2023

Identification of Negative Transfers in Multitask Learning Using Surrogate Models

Multitask learning is widely used in practice to train a low-resource ta...
research
02/07/2023

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

Recently, Language Models (LMs) instruction-tuned on multiple tasks, als...
research
02/02/2022

Co-training Improves Prompt-based Learning for Large Language Models

We demonstrate that co-training (Blum Mitchell, 1998) can improve th...
research
05/20/2022

Can Foundation Models Wrangle Your Data?

Foundation Models (FMs) are models trained on large corpora of data that...

Please sign up or login with your details

Forgot password? Click here to reset