Can Differentiable Decision Trees Learn Interpretable Reward Functions?

06/22/2023
by   Akansha Kalra, et al.
0

There is an increasing interest in learning reward functions that model human intent and human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including Cartpole, Visual Gridworld environments and Atari games, provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We experimentally demonstrate that using reward DDTs results in competitive performance when compared with larger capacity deep neural network reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simple, non-shaped rewards to afford interpretability.

READ FULL TEXT

page 6

page 7

page 11

page 17

page 18

page 19

page 21

page 22

research
12/10/2020

Understanding Learned Reward Functions

In many real-world tasks, it is not possible to procedurally specify an ...
research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
06/05/2022

Models of human preference for learning reward functions

The utility of reinforcement learning is limited by the alignment of rew...
research
10/03/2022

Reward Learning with Trees: Methods and Evaluation

Recent efforts to learn reward functions from human feedback have tended...
research
07/14/2020

Programming by Rewards

We formalize and study “programming by rewards” (PBR), a new approach fo...
research
03/25/2022

Preprocessing Reward Functions for Interpretability

In many real-world applications, the reward function is too complex to b...
research
09/15/2022

MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning via Mixing Recurrent Soft Decision Trees

Multi-agent reinforcement learning (MARL) recently has achieved tremendo...

Please sign up or login with your details

Forgot password? Click here to reset