Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

09/04/2023
by   Qisen Yang, et al.
0

The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle's limitations.

READ FULL TEXT
research
03/16/2020

Self-Supervised Discovering of Causal Features: Towards Interpretable Reinforcement Learning

Deep reinforcement learning (RL) has recently led to many breakthroughs ...
research
09/17/2020

Towards Behavior-Level Explanation for Deep Reinforcement Learning

While Deep Neural Networks (DNNs) are becoming the state-of-the-art for ...
research
10/07/2021

Explaining Deep Reinforcement Learning Agents In The Atari Domain through a Surrogate Model

One major barrier to applications of deep Reinforcement Learning (RL) bo...
research
10/28/2021

Extracting Clinician's Goals by What-if Interpretable Modeling

Although reinforcement learning (RL) has tremendous success in many fiel...
research
11/11/2018

Towards Governing Agent's Efficacy: Action-Conditional β-VAE for Deep Transparent Reinforcement Learning

We tackle the blackbox issue of deep neural networks in the settings of ...
research
02/03/2023

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Reward function is essential in reinforcement learning (RL), serving as ...
research
04/19/2018

Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the World

We introduce a method for disentangling independently controllable and u...

Please sign up or login with your details

Forgot password? Click here to reset