Co-training Improves Prompt-based Learning for Large Language Models

02/02/2022
by   Hunter Lang, et al.
0

We demonstrate that co-training (Blum Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data. While prompting has emerged as a promising paradigm for few-shot and zero-shot learning, it is often brittle and requires much larger models compared to the standard supervised setup. We find that co-training makes it possible to improve the original prompt model and at the same time learn a smaller, downstream task-specific model. In the case where we only have partial access to a prompt model (e.g., output probabilities from GPT-3 (Brown et al., 2020)) we learn a calibration model over the prompt outputs. When we have full access to the prompt model's gradients but full finetuning remains prohibitively expensive (e.g., T0 (Sanh et al., 2021)), we learn a set of soft prompt continuous vectors to iteratively update the prompt model. We find that models trained in this manner can significantly improve performance on challenging datasets where there is currently a large gap between prompt-based learning and fully-supervised models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Bidirectional Language Models Are Also Few-shot Learners

Large language models such as GPT-3 (Brown et al., 2020) can perform arb...
research
04/29/2022

Prompt Consistency for Zero-Shot Task Generalization

One of the most impressive results of recent NLP history is the ability ...
research
10/13/2022

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

Language is constantly changing and evolving, leaving language models to...
research
05/22/2023

Small Language Models Improve Giants by Rewriting Their Outputs

Large language models (LLMs) have demonstrated impressive few-shot learn...
research
10/31/2022

Learning New Tasks from a Few Examples with Soft-Label Prototypes

It has been experimentally demonstrated that humans are able to learn in...
research
09/10/2021

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

GPT-3 shows remarkable in-context learning ability of large-scale langua...
research
12/01/2022

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

Language models trained on massive prompted multitask datasets like T0 (...

Please sign up or login with your details

Forgot password? Click here to reset