On the Connections between Counterfactual Explanations and Adversarial Examples

06/18/2021
by   Martin Pawelczyk, et al.
16

Counterfactual explanations and adversarial examples have emerged as critical research areas for addressing the explainability and robustness goals of machine learning (ML). While counterfactual explanations were developed with the goal of providing recourse to individuals adversely impacted by algorithmic decisions, adversarial examples were designed to expose the vulnerabilities of ML models. While prior research has hinted at the commonalities between these frameworks, there has been little to no work on systematically exploring the connections between the literature on counterfactual explanations and adversarial examples. In this work, we make one of the first attempts at formalizing the connections between counterfactual explanations and adversarial examples. More specifically, we theoretically analyze salient counterfactual explanation and adversarial example generation methods, and highlight the conditions under which they behave similarly. Our analysis demonstrates that several popular counterfactual explanation and adversarial example generation methods such as the ones proposed by Wachter et. al. and Carlini and Wagner (with mean squared error loss), and C-CHVAE and natural adversarial examples by Zhao et. al. are equivalent. We also bound the distance between counterfactual explanations and adversarial examples generated by Wachter et. al. and DeepFool methods for linear models. Finally, we empirically validate our theoretical findings using extensive experimentation with synthetic and real world datasets.

READ FULL TEXT
research
09/11/2020

Counterfactual Explanations Adversarial Examples – Common Grounds, Essential Differences, and Potential Transfers

It is well known that adversarial examples and counterfactual explanatio...
research
12/18/2020

Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Recent papers in explainable AI have made a compelling case for counterf...
research
06/25/2019

Explaining Deep Learning Models with Constrained Adversarial Examples

Machine learning algorithms generally suffer from a problem of explainab...
research
03/01/2021

Counterfactual Explanations for Oblique Decision Trees: Exact, Efficient Algorithms

We consider counterfactual explanations, the problem of minimally adjust...
research
02/25/2020

Gödel's Sentence Is An Adversarial Example But Unsolvable

In recent years, different types of adversarial examples from different ...
research
08/24/2020

PermuteAttack: Counterfactual Explanation of Machine Learning Credit Scorecards

This paper is a note on new directions and methodologies for validation ...
research
12/06/2019

Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers

Explaining the output of a complex machine learning (ML) model often req...

Please sign up or login with your details

Forgot password? Click here to reset