Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

12/20/2021
by   Tom Bewley, et al.
14

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward function is inferred from sparse human feedback. However, prior PbRL methods lack interpretability of the learned reward structure, which hampers the ability to assess robustness and alignment. We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree. Using both synthetic and human-provided feedback, we demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.

READ FULL TEXT

page 3

page 6

page 8

page 15

research
09/06/2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Reward design is a fundamental, yet challenging aspect of practical rein...
research
05/26/2023

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

We propose a method to capture the handling abilities of fast jet pilots...
research
05/24/2023

Inverse Preference Learning: Preference-based RL without a Reward Function

Reward functions are difficult to design and often hard to align with hu...
research
10/17/2022

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Specifying rewards for reinforcement learned (RL) agents is challenging....
research
10/03/2022

Reward Learning with Trees: Methods and Evaluation

Recent efforts to learn reward functions from human feedback have tended...
research
07/08/2023

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

In recent years, work has gone into developing deep interpretable method...
research
06/22/2023

Can Differentiable Decision Trees Learn Interpretable Reward Functions?

There is an increasing interest in learning reward functions that model ...

Please sign up or login with your details

Forgot password? Click here to reset