Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

02/22/2021
by   Ginevra Carbone, et al.
0

We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. We empirically show that, for deterministic Neural Networks, saliency interpretations are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. By leveraging recent results, we provide a theoretical explanation of this result in terms of the geometry of adversarial attacks. Based on these theoretical considerations, we suggest and demonstrate empirically that saliency explanations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations. Our results not only confirm that Bayesian Neural Networks are more robust to adversarial attacks, but also demonstrate that Bayesian methods have the potential to provide more stable and interpretable assessments of Neural Network predictions.

READ FULL TEXT

page 6

page 7

page 8

research
02/24/2021

Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis

The susceptibility of deep neural networks to untrustworthy predictions,...
research
03/06/2020

Explaining Away Attacks Against Neural Networks

We investigate the problem of identifying adversarial attacks on image-b...
research
05/10/2019

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Recent studies on the adversarial vulnerability of neural networks have ...
research
09/13/2021

The mathematics of adversarial attacks in AI – Why deep learning is unstable despite the existence of stable neural networks

The unprecedented success of deep learning (DL) makes it unchallenged wh...
research
06/17/2021

Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks

To evaluate the robustness gain of Bayesian neural networks on image cla...
research
03/17/2023

Adversarial Counterfactual Visual Explanations

Counterfactual explanations and adversarial attacks have a related goal:...

Please sign up or login with your details

Forgot password? Click here to reset