Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization

07/09/2022
by   Deyin Liu, et al.
0

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve the robustness of DNNs through training a mixture of corrupted and clean data. However, most of AT based methods are ineffective in dealing with transferred adversarial examples which are generated to fool a wide spectrum of defense models, and thus cannot satisfy the generalization requirement raised in real-world scenarios. Moreover, adversarially training a defense model in general cannot produce interpretable predictions towards the inputs with perturbations, whilst a highly interpretable robust model is required by different domain experts to understand the behaviour of a DNN. In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which suggests the linearized robustness through Jacobian normalization and also regularizes the perturbation-based saliency maps to imitate the model's interpretable predictions. As such, we achieve both the improved defense and high interpretability of DNNs. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.

READ FULL TEXT
research
06/09/2021

Attacking Adversarial Attacks as A Defense

It is well known that adversarial attacks can fool deep neural networks ...
research
11/26/2017

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

Deep neural networks have proven remarkably effective at solving many cl...
research
05/18/2022

Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability

Although deep Reinforcement Learning (RL) has proven successful in a wid...
research
12/07/2019

Does Interpretability of Neural Networks Imply Adversarial Robustness?

The success of deep neural networks is clouded by two issues that largel...
research
05/31/2019

L0 Regularization Based Neural Network Design and Compression

We consider complexity of Deep Neural Networks (DNNs) and their associat...
research
11/27/2020

Rethinking Uncertainty in Deep Learning: Whether and How it Improves Robustness

Deep neural networks (DNNs) are known to be prone to adversarial attacks...
research
06/06/2023

Revisiting the Trade-off between Accuracy and Robustness via Weight Distribution of Filters

Adversarial attacks have been proven to be potential threats to Deep Neu...

Please sign up or login with your details

Forgot password? Click here to reset