Reward Learning with Trees: Methods and Evaluation

10/03/2022
by   Tom Bewley, et al.
0

Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment. We explore the merits of learning intrinsically interpretable tree models instead. We develop a recently proposed method for learning reward trees from preference labels, and show it to be broadly competitive with neural networks on challenging high-dimensional tasks, with good robustness to limited or corrupted data. Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used, demonstrating that the interpretable reward structure gives significant scope for traceability, verification and explanation.

READ FULL TEXT
research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
05/26/2023

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

We propose a method to capture the handling abilities of fast jet pilots...
research
06/22/2023

Can Differentiable Decision Trees Learn Interpretable Reward Functions?

There is an increasing interest in learning reward functions that model ...
research
09/16/2021

Interpretable Local Tree Surrogate Policies

High-dimensional policies, such as those represented by neural networks,...
research
05/30/2022

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

We generalise the problem of reward modelling (RM) for reinforcement lea...
research
08/16/2021

APReL: A Library for Active Preference-based Reward Learning Algorithms

Reward learning is a fundamental problem in robotics to have robots that...
research
12/30/2021

Self Reward Design with Fine-grained Interpretability

Transparency and fairness issues in Deep Reinforcement Learning may stem...

Please sign up or login with your details

Forgot password? Click here to reset