What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes

11/10/2020
by   Herman Yau, et al.
0

We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning. Rather, the information needed for the explanations must be collected in conjunction with training the agent. We derive approaches designed to extract local explanations based on intention for several variants of Q-function approximation and prove consistency between the explanations and the Q-values learned. We demonstrate our method on multiple reinforcement learning problems, and provide code to help researchers introspecting their RL environments and algorithms.

READ FULL TEXT

page 16

page 17

research
11/14/2022

(When) Are Contrastive Explanations of Reinforcement Learning Helpful?

Global explanations of a reinforcement learning (RL) agent's expected be...
research
05/14/2021

Feature-Based Interpretable Reinforcement Learning based on State-Transition Models

Growing concerns regarding the operational usage of AI models in the rea...
research
06/09/2023

Explaining Reinforcement Learning with Shapley Values

For reinforcement learning systems to be widely adopted, their users mus...
research
10/10/2022

Experiential Explanations for Reinforcement Learning

Reinforcement Learning (RL) approaches are becoming increasingly popular...
research
12/16/2021

Inherently Explainable Reinforcement Learning in Natural Language

We focus on the task of creating a reinforcement learning agent that is ...
research
10/11/2020

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

We investigate a deep reinforcement learning (RL) architecture that supp...
research
01/28/2020

Distal Explanations for Explainable Reinforcement Learning Agents

Causal explanations present an intuitive way to understand the course of...

Please sign up or login with your details

Forgot password? Click here to reset