Demystifying Prompts in Language Models via Perplexity Estimation

by   Hila Gonen, et al.
University of Washington

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work, we analyze the factors that contribute to this variance and establish a new empirical hypothesis: the performance of a prompt is coupled with the extent to which the model is familiar with the language it contains. Over a wide range of tasks, we show that the lower the perplexity of the prompt is, the better the prompt is able to perform the task. As a result, we devise a method for creating prompts: (1) automatically extend a small seed set of manually written prompts by paraphrasing using GPT3 and backtranslation and (2) choose the lowest perplexity prompts to get significant gains in performance.


page 5

page 9


Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models

Masked language models like BERT can perform text classification in a ze...

Detecting Hate Speech with GPT-3

Sophisticated language models such as OpenAI's GPT-3 can generate hatefu...

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Prompting language models (LMs) with training examples and task descript...

Is ChatGPT a Biomedical Expert? – Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks

We assessed the performance of commercial Large Language Models (LLMs) G...

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Large language models (LLMs) have been shown to perform well at a variet...

Prompting Language Models for Linguistic Structure

Although pretrained language models (PLMs) can be prompted to perform a ...

Please sign up or login with your details

Forgot password? Click here to reset