Is RLHF More Difficult than Standard RL?

06/25/2023
by   Yuanhao Wang, et al.
0

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals. Preferences arguably contain less information than rewards, which makes preference-based RL seemingly more difficult. This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs. Specifically, (1) for preferences that are drawn from reward-based probabilistic models, we reduce the problem to robust reward-based RL that can tolerate small errors in rewards; (2) for general arbitrary preferences where the objective is to find the von Neumann winner, we reduce the problem to multiagent reward-based RL which finds Nash equilibria for factored Markov games under a restricted set of policies. The latter case can be further reduce to adversarial MDP when preferences only depend on the final state. We instantiate all reward-based RL subroutines by concrete provable algorithms, and apply our theory to a large class of models including tabular MDPs and MDPs with generic function approximation. We further provide guarantees when K-wise comparisons are available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Reward design is a fundamental, yet challenging aspect of practical rein...
research
11/04/2021

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Reinforcement learning (RL) requires access to a reward function that in...
research
01/27/2023

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle t...
research
02/12/2021

Disturbing Reinforcement Learning Agents with Corrupted Rewards

Reinforcement Learning (RL) algorithms have led to recent successes in s...
research
08/04/2019

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

In preference-based reinforcement learning (RL), an agent interacts with...
research
05/24/2023

Provable Offline Reinforcement Learning with Human Feedback

In this paper, we investigate the problem of offline reinforcement learn...
research
04/13/2022

A Study of Causal Confusion in Preference-Based Reward Learning

Learning robot policies via preference-based reward learning is an incre...

Please sign up or login with your details

Forgot password? Click here to reset