Adversarial Counterfactual Visual Explanations

03/17/2023
by   Guillaume Jeanneret, et al.
1

Counterfactual explanations and adversarial attacks have a related goal: flipping output labels with minimal perturbations regardless of their characteristics. Yet, adversarial attacks cannot be used directly in a counterfactual explanation perspective, as such perturbations are perceived as noise and not as actionable and understandable image modifications. Building on the robust learning literature, this paper proposes an elegant method to turn adversarial attacks into semantically meaningful perturbations, without modifying the classifiers to explain. The proposed approach hypothesizes that Denoising Diffusion Probabilistic Models are excellent regularizers for avoiding high-frequency and out-of-distribution perturbations when generating adversarial attacks. The paper's key idea is to build attacks through a diffusion model to polish them. This allows studying the target model regardless of its robustification level. Extensive experimentation shows the advantages of our counterfactual explanation approach over current State-of-the-Art in multiple testbeds.

READ FULL TEXT

page 14

page 15

page 16

page 17

page 18

page 19

page 22

page 23

research
01/17/2023

Denoising Diffusion Probabilistic Models as a Defense against Adversarial Attacks

Neural Networks are infamously sensitive to small perturbations in their...
research
06/14/2022

When adversarial attacks become interpretable counterfactual explanations

We argue that, when learning a 1-Lipschitz neural network with the dual ...
research
09/07/2023

DiffDefense: Defending against Adversarial Attacks via Diffusion Models

This paper presents a novel reconstruction method that leverages Diffusi...
research
08/20/2019

Counterfactual Distribution Regression for Structured Inference

We consider problems in which a system receives external perturbations f...
research
02/22/2021

Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

We consider the problem of the stability of saliency-based explanations ...
research
03/15/2023

The Devil's Advocate: Shattering the Illusion of Unexploitable Data using Diffusion Models

Protecting personal data against the exploitation of machine learning mo...
research
11/24/2020

Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning

Prediction credibility measures, in the form of confidence intervals or ...

Please sign up or login with your details

Forgot password? Click here to reset