ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models

10/22/2021
by   Ziheng Chen, et al.
18

The demand for explainable machine learning (ML) models has been growing rapidly in recent years. Amongst the methods proposed to associate ML model predictions with human-understandable rationale, counterfactual explanations are one of the most popular. They consist of post-hoc rules derived from counterfactual examples (CFs), i.e., modified versions of input samples that result in alternative output responses from the predictive model to be explained. However, existing CF generation strategies either exploit the internals of specific models (e.g., random forests or neural networks), or depend on each sample's neighborhood, which makes them hard to be generalized for more complex models and inefficient for larger datasets. In this work, we aim to overcome these limitations and introduce a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task and then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. Differently from other techniques, our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with. In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent. Extensive experiments conducted on several datasets have shown that our method outperforms existing CF generation baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2022

Shapelet-Based Counterfactual Explanations for Multivariate Time Series

As machine learning and deep learning models have become highly prevalen...
research
06/07/2021

Amortized Generation of Sequential Counterfactual Explanations for Black-box Models

Explainable machine learning (ML) has gained traction in recent years du...
research
03/04/2022

Benchmark Evaluation of Counterfactual Algorithms for XAI: From a White Box to a Black Box

Counterfactual explanations have recently been brought to light as a pot...
research
07/22/2019

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Post-hoc interpretability approaches have been proven to be powerful too...
research
09/30/2021

XPROAX-Local explanations for text classification with progressive neighborhood approximation

The importance of the neighborhood for training a local surrogate model ...
research
04/17/2021

Optimal Counterfactual Explanations for Scorecard modelling

Counterfactual explanations is one of the post-hoc methods used to provi...
research
01/17/2022

Principled Diverse Counterfactuals in Multilinear Models

Machine learning (ML) applications have automated numerous real-life tas...

Please sign up or login with your details

Forgot password? Click here to reset