RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling

09/26/2019
by   Jinsung Yoon, et al.
20

Understanding black-box machine learning models is important towards their widespread adoption. However, developing globally interpretable models that explain the behavior of the entire model is challenging. An alternative approach is to explain black-box models through explaining individual prediction using a locally interpretable model. In this paper, we propose a novel method for locally interpretable modeling - Reinforcement Learning-based Locally Interpretable Modeling (RL-LIM). RL-LIM employs reinforcement learning to select a small number of samples and distill the black-box model prediction into a low-capacity locally interpretable model. Training is guided with a reward that is obtained directly by measuring agreement of the predictions from the locally interpretable model with the black-box model. RL-LIM near-matches the overall prediction performance of black-box models while yielding human-like interpretability, and significantly outperforms state of the art locally interpretable models in terms of overall prediction performance and fidelity.

READ FULL TEXT

page 10

page 17

page 18

research
05/06/2021

Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

We propose Partially Interpretable Estimators (PIE) which attribute a pr...
research
09/23/2019

Model-Agnostic Linear Competitors – When Interpretable Models Compete and Collaborate with Black-Box Models

Driven by an increasing need for model interpretability, interpretable m...
research
08/29/2022

Interpreting Black-box Machine Learning Models for High Dimensional Datasets

Deep neural networks (DNNs) have been shown to outperform traditional ma...
research
05/14/2021

Information-theoretic Evolution of Model Agnostic Global Explanations

Explaining the behavior of black box machine learning models through hum...
research
07/15/2020

VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven Model Interpretability Applied to the Ironmaking Industry

Machine learning applied to generate data-driven models are lacking of t...
research
09/13/2019

A Double Penalty Model for Interpretability

Modern statistical learning techniques have often emphasized prediction ...
research
10/31/2019

A study of data and label shift in the LIME framework

LIME is a popular approach for explaining a black-box prediction through...

Please sign up or login with your details

Forgot password? Click here to reset