TEMPERA: Test-Time Prompting via Reinforcement Learning

11/21/2022
by   Tianjun Zhang, et al.
0

Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive to different queries and provides an interpretable prompt for every query. To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers. The proposed method achieves significant gains compared with recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a variety of tasks including sentiment analysis, topic classification, natural language inference, and reading comprehension. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.

READ FULL TEXT
research
07/15/2021

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

Pretrained Language Models (PLMs) have achieved tremendous success in na...
research
10/01/2019

Revisiting Fine-tuning for Few-shot Learning

Few-shot learning is the process of learning novel classes using only a ...
research
04/19/2023

MasakhaNEWS: News Topic Classification for African languages

African languages are severely under-represented in NLP research due to ...
research
05/29/2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Misalignment between the outputs of a vision-language (VL) model and tas...
research
05/25/2023

Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts

Recent studies have demonstrated that natural-language prompts can help ...
research
05/23/2023

GrACE: Generation using Associated Code Edits

Developers expend a significant amount of time in editing code for a var...
research
10/14/2022

Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

We propose a novel system to help fact-checkers formulate search queries...

Please sign up or login with your details

Forgot password? Click here to reset