What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

09/30/2022
by   Jinghui Lu, et al.
0

In this paper, we propose a theoretical framework to explain the efficacy of prompt learning in zero/few-shot scenarios. First, we prove that conventional pre-training and fine-tuning paradigm fails in few-shot scenarios due to overfitting the unrepresentative labelled data. We then detail the assumption that prompt learning is more effective because it empowers pre-trained language model that is built upon massive text corpora, as well as domain-related human knowledge to participate more in prediction and thereby reduces the impact of limited label information provided by the small training set. We further hypothesize that language discrepancy can measure the quality of prompting. Comprehensive experiments are performed to verify our assumptions. More remarkably, inspired by the theoretical framework, we propose an annotation-agnostic template selection method based on perplexity, which enables us to “forecast” the prompting performance in advance. This approach is especially encouraging because existing work still relies on development set to post-hoc evaluate templates. Experiments show that this method leads to significant prediction benefits compared to state-of-the-art zero-shot methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2022

Pre-trained Language Models can be Fully Zero-Shot Learners

How can we extend a pre-trained model to many language understanding tas...
research
06/05/2023

Prompt to be Consistent is Better than Self-Consistent? Few-Shot and Zero-Shot Fact Verification with Pre-trained Language Models

Few-shot or zero-shot fact verification only relies on a few or no label...
research
07/28/2021

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

This paper surveys and organizes research works in a new paradigm in nat...
research
09/05/2022

PromptAttack: Prompt-based Attack for Language Models via Gradient Search

As the pre-trained language models (PLMs) continue to grow, so do the ha...
research
09/09/2021

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

Despite their success in a variety of NLP tasks, pre-trained language mo...
research
10/21/2020

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures

Measuring sentence semantic similarity using pre-trained language models...
research
05/24/2022

On the Role of Bidirectionality in Language Model Pre-Training

Prior work on language model pre-training has explored different archite...

Please sign up or login with your details

Forgot password? Click here to reset