RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

09/01/2023
by   Harrison Lee, et al.
0

Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LLMs) to human preferences, but gathering high quality human preference labels is a key bottleneck. We conduct a head-to-head comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find that they result in similar improvements. On the task of summarization, human evaluators prefer generations from both RLAIF and RLHF over a baseline supervised fine-tuned model in  70 RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results suggest that RLAIF can yield human-level performance, offering a potential solution to the scalability limitations of RLHF.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2023

Training Language Models with Language Feedback at Scale

Pretrained language models often generate outputs that are not in line w...
research
12/15/2022

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...
research
07/27/2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for tra...
research
09/19/2023

Large language models can accurately predict searcher preferences

Relevance labels, which indicate whether a search result is valuable to ...
research
04/12/2022

Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback

For summarization, human preference is critical to tame outputs of the s...
research
06/12/2022

Self-critiquing models for assisting human evaluators

We fine-tune large language models to write natural language critiques (...
research
09/01/2023

Reinforcement Learning with Human Feedback for Realistic Traffic Simulation

In light of the challenges and costs of real-world testing, autonomous v...

Please sign up or login with your details

Forgot password? Click here to reset