Calibrate Before Use: Improving Few-Shot Performance of Language Models

02/19/2021
by   Tony Z. Zhao, et al.
6

GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art. We demonstrate that this instability arises from the bias of language models towards predicting certain answers, e.g., those that are placed near the end of the prompt or are common in the pre-training data. To mitigate this, we first estimate the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A". We then fit calibration parameters that cause the prediction for this input to be uniform across answers. On a diverse set of tasks, this contextual calibration procedure substantially improves GPT-3 and GPT-2's average accuracy (up to 30.0 different choices of the prompt.

READ FULL TEXT
research
06/24/2021

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Prompting language models (LMs) with training examples and task descript...
research
03/23/2023

Fairness-guided Few-shot Prompting for Large Language Models

Large language models have demonstrated surprising ability to perform in...
research
12/05/2022

Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration

Pre-trained language models (PLMs) have exhibited remarkable few-shot le...
research
04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...
research
07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...
research
03/30/2023

Recognition, recall, and retention of few-shot memories in large language models

The training of modern large language models (LLMs) takes place in a reg...
research
06/20/2023

GenPlot: Increasing the Scale and Diversity of Chart Derendering Data

Vertical bars, horizontal bars, dot, scatter, and line plots provide a d...

Please sign up or login with your details

Forgot password? Click here to reset