Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

03/30/2022
by   Yuansheng Xie, et al.
0

Artificial intelligence, particularly through recent advancements in deep learning, has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. In addition to desirable evaluation metrics, a high level of interpretability is often required for these models to be reliably utilized. Therefore, explanations that offer insight into the process by which a model maps its inputs onto its outputs are much sought-after. Unfortunately, current black box nature of machine learning models is still an unresolved issue and this very nature prevents researchers from learning and providing explicative descriptions for a model's behavior and final predictions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model's decision-making process.

READ FULL TEXT
research
02/01/2019

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

Due to the capability of deep learning to perform well in high dimension...
research
07/17/2019

A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI

Recently, artificial intelligence, especially machine learning has demon...
research
11/15/2022

Evaluating the Faithfulness of Saliency-based Explanations for Deep Learning Models for Temporal Colour Constancy

The opacity of deep learning models constrains their debugging and impro...
research
12/29/2019

Computational model discovery with reinforcement learning

The motivation of this study is to leverage recent breakthroughs in arti...
research
08/22/2022

Shapelet-Based Counterfactual Explanations for Multivariate Time Series

As machine learning and deep learning models have become highly prevalen...
research
05/30/2023

Explaining Hate Speech Classification with Model Agnostic Methods

There have been remarkable breakthroughs in Machine Learning and Artific...
research
06/12/2021

Explaining the Deep Natural Language Processing by Mining Textual Interpretable Features

Despite the high accuracy offered by state-of-the-art deep natural-langu...

Please sign up or login with your details

Forgot password? Click here to reset