Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability

05/18/2022
by   Jinwei Xing, et al.
5

Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 12

page 13

page 14

research
07/09/2022

Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization

Deep neural networks (DNNs) are known to be vulnerable to adversarial ex...
research
02/04/2022

Learning Interpretable, High-Performing Policies for Continuous Control Problems

Gradient-based approaches in reinforcement learning (RL) have achieved t...
research
09/18/2023

Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration

Safe Reinforcement Learning (RL) aims to find a policy that achieves hig...
research
12/09/2019

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL

Saliency maps have been used to support explanations of deep reinforceme...
research
08/14/2020

Defending Adversarial Attacks without Adversarial Attacks in Deep Reinforcement Learning

Many recent studies in deep reinforcement learning (DRL) have proposed t...
research
05/31/2019

L0 Regularization Based Neural Network Design and Compression

We consider complexity of Deep Neural Networks (DNNs) and their associat...
research
06/23/2021

Gradient-Based Interpretability Methods and Binarized Neural Networks

Binarized Neural Networks (BNNs) have the potential to revolutionize the...

Please sign up or login with your details

Forgot password? Click here to reset