When adversarial attacks become interpretable counterfactual explanations

06/14/2022
by   Mathieu Serrurier, et al.
0

We argue that, when learning a 1-Lipschitz neural network with the dual loss of an optimal transportation problem, the gradient of the model is both the direction of the transportation plan and the direction to the closest adversarial attack. Traveling along the gradient to the decision boundary is no more an adversarial attack but becomes a counterfactual explanation, explicitly transporting from one class to the other. Through extensive experiments on XAI metrics, we find that the simple saliency map method, applied on such networks, becomes a reliable explanation, and outperforms the state-of-the-art explanation approaches on unconstrained models. The proposed networks were already known to be certifiably robust, and we prove that they are also explainable with a fast and simple method.

READ FULL TEXT

page 23

page 24

page 25

page 26

page 27

page 28

page 29

page 30

research
03/17/2023

Adversarial Counterfactual Visual Explanations

Counterfactual explanations and adversarial attacks have a related goal:...
research
12/28/2022

Robust Ranking Explanations

Gradient-based explanation is the cornerstone of explainable deep networ...
research
10/07/2019

Interpretable Disentanglement of Neural Networks by Extracting Class-Specific Subnetwork

We propose a novel perspective to understand deep neural networks in an ...
research
04/11/2022

Generalizing Adversarial Explanations with Grad-CAM

Gradient-weighted Class Activation Mapping (Grad- CAM), is an example-ba...
research
03/26/2021

Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation

We present a novel method for reliably explaining the predictions of neu...
research
06/11/2020

Achieving robustness in classification using optimal transport with hinge regularization

We propose a new framework for robust binary classification, with Deep N...
research
03/24/2023

IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients

Integrated Gradients (IG) as well as its variants are well-known techniq...

Please sign up or login with your details

Forgot password? Click here to reset