Adversarial Feature Desensitization

06/08/2020
by   Pouya Bashivan, et al.
0

Deep neural networks can now perform many tasks that were once thought to be only feasible for humans. Unfortunately, while reaching impressive performance under standard settings, such networks are known to be susceptible to adversarial attacks – slight but carefully constructed perturbations of the inputs which drastically decrease the network performance and reduce their trustworthiness. Here we propose to improve network robustness to input perturbations via an adversarial training procedure which we call Adversarial Feature Desensitization (AFD). We augment the normal supervised training with an adversarial game between the embedding network and an additional adversarial decoder which is trained to discriminate between the clean and perturbed inputs from their high-level embeddings. Our theoretical and empirical evidence acknowledges the effectiveness of this approach in learning robust features on MNIST, CIFAR10, and CIFAR100 datasets – substantially improving the state-of-the-art in robust classification against previously observed adversarial attacks. More importantly, we demonstrate that AFD has better generalization ability than previous methods, as the learned features maintain their robustness against a large range of perturbations, including perturbations not seen during training. These results indicate that reducing feature sensitivity using adversarial training is a promising approach for ameliorating the problem of adversarial attacks in deep neural networks.

READ FULL TEXT
research
06/09/2021

Attacking Adversarial Attacks as A Defense

It is well known that adversarial attacks can fool deep neural networks ...
research
05/01/2020

Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks

We evaluate machine comprehension models' robustness to noise and advers...
research
11/27/2020

Rethinking Uncertainty in Deep Learning: Whether and How it Improves Robustness

Deep neural networks (DNNs) are known to be prone to adversarial attacks...
research
11/21/2022

Addressing Mistake Severity in Neural Networks with Semantic Knowledge

Robustness in deep neural networks and machine learning algorithms in ge...
research
04/01/2020

Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes

As humans, we inherently perceive images based on their predominant feat...
research
11/02/2022

Isometric Representations in Neural Networks Improve Robustness

Artificial and biological agents cannon learn given completely random an...
research
07/28/2020

Reachable Sets of Classifiers Regression Models: (Non-)Robustness Analysis and Robust Training

Neural networks achieve outstanding accuracy in classification and regre...

Please sign up or login with your details

Forgot password? Click here to reset