Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

09/06/2023
by   Alexander Bukharin, et al.
0

Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework – HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.

READ FULL TEXT

page 10

page 22

page 23

research
11/04/2021

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Reinforcement learning (RL) requires access to a reward function that in...
research
03/02/2023

Preference Transformer: Modeling Human Preferences using Transformers for RL

Preference-based reinforcement learning (RL) provides a framework to tra...
research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
06/25/2023

Is RLHF More Difficult than Standard RL?

Reinforcement learning from Human Feedback (RLHF) learns from preference...
research
11/12/2022

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

Preference-based reinforcement learning (RL) algorithms help avoid the p...
research
06/05/2022

Models of human preference for learning reward functions

The utility of reinforcement learning is limited by the alignment of rew...
research
02/17/2023

Data Driven Reward Initialization for Preference based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods utilize binary fe...

Please sign up or login with your details

Forgot password? Click here to reset