B-Pref: Benchmarking Preference-Based Reinforcement Learning

11/04/2021
by   Kimin Lee, et al.
0

Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL provides an alternative: learning policies using a teacher's preferences without pre-defined rewards, thus overcoming concerns associated with reward engineering. However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark. In this paper, we introduce B-Pref: a benchmark specially designed for preference-based RL. A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly, which makes relying on real human input for evaluation prohibitive. At the same time, simulating human input as giving perfect preferences for the ground truth reward function is unrealistic. B-Pref alleviates this by simulating teachers with a wide array of irrationalities, and proposes metrics not solely for performance but also for robustness to these potential irrationalities. We showcase the utility of B-Pref by using it to analyze algorithmic design choices, such as selecting informative queries, for state-of-the-art preference-based RL algorithms. We hope that B-Pref can serve as a common starting point to study preference-based RL more systematically. Source code is available at https://github.com/rll-research/B-Pref.

READ FULL TEXT

page 20

page 21

research
09/06/2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Reward design is a fundamental, yet challenging aspect of practical rein...
research
06/25/2023

Is RLHF More Difficult than Standard RL?

Reinforcement learning from Human Feedback (RLHF) learns from preference...
research
02/12/2019

Preferences Implicit in the State of the World

Reinforcement learning (RL) agents optimize only the features specified ...
research
09/03/2019

Better Rewards Yield Better Summaries: Learning to Summarise Without References

Reinforcement Learning (RL) based document summarisation systems yield s...
research
08/29/2018

APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning

We propose a method to perform automatic document summarisation without ...
research
12/08/2021

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

We present ShinRL, an open-source library specialized for the evaluation...
research
10/10/2021

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization

The goal of continuous control is to synthesize desired behaviors. In re...

Please sign up or login with your details

Forgot password? Click here to reset