Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

03/29/2022
by   Gabriel Orlanski, et al.
0

Large language models have shown that impressive zero-shot performance can be achieved through natural language prompts (Radford et al., 2019; Brown et al., 2020; Sanh et al., 2021). Creating an effective prompt, however, requires significant trial and error. That prompts the question: how do the qualities of a prompt effects its performance? To this end, we collect and standardize prompts from a diverse range of tasks for use with tasks they were not designed for. We then evaluate these prompts across fixed multiple choice datasets for a quantitative analysis of how certain attributes of a prompt affect performance. We find that including the choices and using prompts not used during pre-training provide significant improvements. All experiments and code can be found https://github.com/gabeorlanski/zero-shot-cross-task.

READ FULL TEXT
research
10/31/2021

A Systematic Investigation of Commonsense Understanding in Large Language Models

Large language models have shown impressive performance on many natural ...
research
04/16/2021

Surface Form Competition: Why the Highest Probability Answer Isn't Always Right

Large language models have shown promising results in zero-shot settings...
research
01/05/2023

Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI

Detecting "toxic" language in internet content is a pressing social and ...
research
07/28/2022

LAD: Language Models as Data for Zero-Shot Dialog

To facilitate zero-shot generalization in taskoriented dialog, this pape...
research
08/15/2022

Z-BERT-A: a zero-shot Pipeline for Unknown Intent detection

Intent discovery is a fundamental task in NLP, and it is increasingly re...
research
06/14/2023

Assessing the Effectiveness of GPT-3 in Detecting False Political Statements: A Case Study on the LIAR Dataset

The detection of political fake statements is crucial for maintaining in...
research
12/01/2022

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

Language models trained on massive prompted multitask datasets like T0 (...

Please sign up or login with your details

Forgot password? Click here to reset