Learning to Generate Better Than Your LLM

06/20/2023
by   Jonathan D. Chang, et al.
0

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models (LLMs) for conditional text generation. In particular, recent LLMs such as ChatGPT and GPT-4 can engage in fluent conversations with users by incorporating RL and feedback from humans. Inspired by learning-to-search algorithms and capitalizing on key properties of text generation, we seek to investigate reinforcement learning algorithms beyond general purpose algorithms such as Proximal policy optimization (PPO). In particular, we extend RL algorithms to allow them to interact with a dynamic black-box guide LLM such as GPT-3 and propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM fine-tuning. We experiment on the IMDB positive review and CommonGen text generation task from the GRUE benchmark. We show that our RL algorithms achieve higher performance than supervised learning (SL) and default PPO baselines, demonstrating the benefit of interaction with the guide LLM. On CommonGen, we not only outperform our SL baselines but also improve beyond PPO across a variety of lexical and semantic metrics beyond the one we optimized for. Notably, on the IMDB dataset, we show that our GPT-2 based policy outperforms the zero-shot GPT-3 oracle, indicating that our algorithms can learn from a powerful, black-box GPT-3 oracle with a simpler, cheaper, and publicly available GPT-2 model while gaining performance.

READ FULL TEXT
research
06/14/2021

Text Generation with Efficient (Soft) Q-Learning

Maximum likelihood estimation (MLE) is the predominant algorithm for tra...
research
03/24/2022

Mix and Match: Learning-free Controllable Text Generation using Energy Language Models

Recent work on controlled text generation has either required attribute-...
research
08/27/2021

ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

Automatic construction of relevant Knowledge Bases (KBs) from text, and ...
research
05/15/2023

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Despite their unprecedented success, even the largest language models ma...
research
05/21/2023

BertRLFuzzer: A BERT and Reinforcement Learning based Fuzzer

We present a novel tool BertRLFuzzer, a BERT and Reinforcement Learning ...
research
02/08/2021

Generate and Revise: Reinforcement Learning in Neural Poetry

Writers, poets, singers usually do not create their compositions in just...
research
09/21/2023

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

Reinforcement learning is a powerful technique for learning from trial a...

Please sign up or login with your details

Forgot password? Click here to reset