Proper Network Interpretability Helps Adversarial Robustness in Classification

06/26/2020
by   Akhilan Boopathy, et al.
11

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to adversarial attacks. In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted ImageNet. Spurred by that, we develop an interpretability-aware defensive scheme built only on promoting robust interpretation (without the need for resorting to adversarial loss minimization). We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation in particular.

READ FULL TEXT

page 3

page 5

page 9

page 21

page 22

research
05/28/2019

Certifiably Robust Interpretation in Deep Learning

Although gradient-based saliency maps are popular methods for deep learn...
research
07/04/2021

Certifiably Robust Interpretation via Renyi Differential Privacy

Motivated by the recent discovery that the interpretation maps of CNNs c...
research
12/06/2018

Towards Hiding Adversarial Examples from Network Interpretation

Deep networks have been shown to be fooled rather easily using adversari...
research
12/07/2019

Does Interpretability of Neural Networks Imply Adversarial Robustness?

The success of deep neural networks is clouded by two issues that largel...
research
02/09/2020

Robust binary classification with the 01 loss

The 01 loss is robust to outliers and tolerant to noisy data compared to...
research
06/17/2021

Adversarial Visual Robustness by Causal Intervention

Adversarial training is the de facto most promising defense against adve...

Please sign up or login with your details

Forgot password? Click here to reset