Counterfactual Generation with Knockoffs

by   Oana-Iuliana Popescu, et al.

Human interpretability of deep neural networks' decisions is crucial, especially in domains where these directly affect human lives. Counterfactual explanations of already trained neural networks can be generated by perturbing input features and attributing importance according to the change in the classifier's outcome after perturbation. Perturbation can be done by replacing features using heuristic or generative in-filling methods. The choice of in-filling function significantly impacts the number of artifacts, i.e., false-positive attributions. Heuristic methods result in false-positive artifacts because the image after the perturbation is far from the original data distribution. Generative in-filling methods reduce artifacts by producing in-filling values that respect the original data distribution. However, current generative in-filling methods may also increase false-negatives due to the high correlation of in-filling values with the original data. In this paper, we propose to alleviate this by generating in-fillings with the statistically-grounded Knockoffs framework, which was developed by Barber and Candès in 2015 as a tool for variable selection with controllable false discovery rate. Knockoffs are statistically null-variables as decorrelated as possible from the original data, which can be swapped with the originals without changing the underlying data distribution. A comparison of different in-filling methods indicates that in-filling with knockoffs can reveal explanations in a more causal sense while still maintaining the compactness of the explanations.


page 6

page 8

page 11

page 12


Sparse Visual Counterfactual Explanations in Image Space

Visual counterfactual explanations (VCEs) in image space are an importan...

ECINN: Efficient Counterfactuals from Invertible Neural Networks

Counterfactual examples identify how inputs can be altered to change the...

Evaluation of Interpretability Methods and Perturbation Artifacts in Deep Neural Networks

The challenge of interpreting predictions from deep neural networks has ...

Explaining Visual Models by Causal Attribution

Model explanations based on pure observational data cannot compute the e...

Lack of evidence for a substantial rate of templated mutagenesis in B cell diversification

B cell receptor sequences diversify through mutations introduced by purp...

FACE: Feasible and Actionable Counterfactual Explanations

Work in Counterfactual Explanations tends to focus on the principle of "...

Please sign up or login with your details

Forgot password? Click here to reset