Aligning Language Models with Offline Reinforcement Learning from Human Feedback

08/23/2023
by   Jian Hu, et al.
0

Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online reinforcement learning (RL) techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline reinforcement learning from human feedback (RLHF) framework to align LMs using pre-generated samples without interacting with RL environments. Specifically, we explore maximum likelihood estimation (MLE) with filtering, reward-weighted regression (RWR), and Decision Transformer (DT) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system (MLSys) and much fewer (around 12.3%) computing resources. Experimental results demonstrate the DT alignment outperforms other Offline RLHF methods and is better than PPO.

READ FULL TEXT
research
04/11/2023

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignm...
research
02/10/2023

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Reinforcement learning has seen wide success in finetuning large languag...
research
05/17/2023

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Learning from human feedback has been shown to be effective at aligning ...
research
06/04/2023

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Reinforcement learning from human feedback (RLHF) has emerged as a relia...
research
08/17/2023

Reinforced Self-Training (ReST) for Language Modeling

Reinforcement learning from human feedback (RLHF) can improve the qualit...
research
09/01/2023

Efficient RLHF: Reducing the Memory Usage of PPO

Reinforcement Learning with Human Feedback (RLHF) has revolutionized lan...
research
09/13/2023

Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

The recent advances in the development of Large Language Models (LLMs) l...

Please sign up or login with your details

Forgot password? Click here to reset