Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

09/13/2023
by   Hao Sun, et al.
0

The recent advances in the development of Large Language Models (LLMs) like ChatGPT have achieved remarkable performance by leveraging human expertise. Yet, fully eliciting LLMs' potential for complex tasks requires navigating the vast search space of natural language prompts. While prompt engineering has shown promise, the requisite human-crafted prompts in trial-and-error attempts and the associated costs pose significant challenges. Crucially, the efficiency of prompt optimization hinges on the costly procedure of prompt evaluation. This work introduces Prompt-OIRL, an approach rooted in offline inverse reinforcement learning that seeks to bridge the gap between effective prompt evaluation and affordability. Our method draws on offline datasets from expert evaluations, employing Inverse-RL to derive a reward model for offline, query-dependent prompt evaluations. The advantages of Prompt-OIRL are manifold: it predicts prompt performance, is cost-efficient, produces human-readable results, and efficiently navigates the prompt space. We validate our method across four LLMs and three arithmetic datasets, highlighting its potential as a robust and effective tool for offline prompt evaluation and optimization. Our code as well as the offline datasets are released, and we highlight the Prompt-OIRL can be reproduced within a few hours using a single laptop using CPU

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2023

Aligning Language Models with Offline Reinforcement Learning from Human Feedback

Learning from human preferences is crucial for language models (LMs) to ...
research
05/15/2018

Leveraging human knowledge in tabular reinforcement learning: A study of human subjects

Reinforcement Learning (RL) can be extremely effective in solving comple...
research
12/15/2022

Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

Reinforcement learning (RL) has shown great promise with algorithms lear...
research
04/18/2022

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Conventionally, generation of natural language for dialogue agents may b...
research
09/15/2023

A Bayesian Approach to Robust Inverse Reinforcement Learning

We consider a Bayesian approach to offline model-based inverse reinforce...
research
06/15/2023

Datasets and Benchmarks for Offline Safe Reinforcement Learning

This paper presents a comprehensive benchmarking suite tailored to offli...
research
03/22/2023

Can we trust the evaluation on ChatGPT?

ChatGPT, the first large language model (LLM) with mass adoption, has de...

Please sign up or login with your details

Forgot password? Click here to reset