The Robust Manifold Defense: Adversarial Training using Generative Models

by   Andrew Ilyas, et al.

Deep neural networks are demonstrating excellent performance on several classical vision problems. However, these networks are vulnerable to adversarial examples, minutely modified images that induce arbitrary attacker-chosen output from the network. We propose a mechanism to protect against these adversarial inputs based on a generative model of the data. We introduce a pre-processing step that projects on the range of a generative model using gradient descent before feeding an input into a classifier. We show that this step provides the classifier with robustness against first-order, substitute model, and combined adversarial attacks. Using a min-max formulation, we show that there may exist adversarial examples even in the range of the generator, natural-looking images extremely close to the decision boundary for which the classifier has unjustifiedly high confidence. We show that adversarial training on the generative manifold can be used to make a classifier that is robust to these attacks. Finally, we show how our method can be applied even without a pre-trained generative model using a recent method called the deep image prior. We evaluate our method on MNIST, CelebA and Imagenet and show robustness against the current state of the art attacks.


page 11

page 13

page 15

page 17

page 25


One Man's Trash is Another Man's Treasure: Resisting Adversarial Examples by Adversarial Examples

Modern image classification systems are often built on deep neural netwo...

Generative Adversarial Examples

Adversarial examples are typically constructed by perturbing an existing...

Exploring the Connection between Robust and Generative Models

We offer a study that connects robust discriminative classifiers trained...

Image Decomposition and Classification through a Generative Model

We demonstrate in this paper that a generative model can be designed to ...

Bidirectional Learning for Robust Neural Networks

A multilayer perceptron can behave as a generative classifier by applyin...

CARSO: Counter-Adversarial Recall of Synthetic Observations

In this paper, we propose a novel adversarial defence mechanism for imag...

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Are foundation models secure from malicious actors? In this work, we foc...

Please sign up or login with your details

Forgot password? Click here to reset