Neural-to-Tree Policy Distillation with Policy Improvement Criterion

08/16/2021
by   Zhao-Hua Li, et al.
0

While deep reinforcement learning has achieved promising results in challenging decision-making tasks, the main bones of its success — deep neural networks are mostly black-boxes. A feasible way to gain insight into a black-box model is to distill it into an interpretable model such as a decision tree, which consists of if-then rules and is easy to grasp and be verified. However, the traditional model distillation is usually a supervised learning task under a stationary data distribution assumption, which is violated in reinforcement learning. Therefore, a typical policy distillation that clones model behaviors with even a small error could bring a data distribution shift, resulting in an unsatisfied distilled policy model with low fidelity or low performance. In this paper, we propose to address this issue by changing the distillation objective from behavior cloning to maximizing an advantage evaluation. The novel distillation objective maximizes an approximated cumulative reward and focuses more on disastrous behaviors in critical states, which controls the data shift effect. We evaluate our method on several Gym tasks, a commercial fight game, and a self-driving car simulator. The empirical results show that the proposed method can preserve a higher cumulative reward than behavior cloning and learn a more consistent policy to the original one. Moreover, by examining the extracted rules from the distilled decision trees, we demonstrate that the proposed method delivers reasonable and robust decisions.

READ FULL TEXT
research
09/19/2022

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

We present Multiple Scenario Verifiable Reinforcement Learning via Polic...
research
07/02/2019

Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy

There is a growing desire in the field of reinforcement learning (and ma...
research
05/22/2018

Verifiable Reinforcement Learning via Policy Extraction

While deep reinforcement learning has successfully solved many challengi...
research
04/27/2020

Evolutionary Stochastic Policy Distillation

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging ...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
02/06/2019

Distilling Policy Distillation

The transfer of knowledge from one policy to another is an important too...
research
12/30/2021

Self Reward Design with Fine-grained Interpretability

Transparency and fairness issues in Deep Reinforcement Learning may stem...

Please sign up or login with your details

Forgot password? Click here to reset