Adversarially Robust Classification by Conditional Generative Model Inversion

01/12/2022
by   Mitra Alirezaei, et al.
0

Most adversarial attack defense methods rely on obfuscating gradients. These methods are successful in defending against gradient-based attacks; however, they are easily circumvented by attacks which either do not use the gradient or by attacks which approximate and use the corrected gradient. Defenses that do not obfuscate gradients such as adversarial training exist, but these approaches generally make assumptions about the attack such as its magnitude. We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images to find the class that generates the closest sample to the query image. We hypothesize that a potential source of brittleness against adversarial attacks is the high-to-low-dimensional nature of feed-forward classifiers which allows an adversary to find small perturbations in the input space that lead to large changes in the output space. On the other hand, a generative model is typically a low-to-high-dimensional mapping. While the method is related to Defense-GAN, the use of a conditional generative model and inversion in our model instead of the feed-forward classifier is a critical difference. Unlike Defense-GAN, which was shown to generate obfuscated gradients that are easily circumvented, we show that our method does not obfuscate gradients. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks compared to naturally trained, feed-forward classifiers.

READ FULL TEXT

page 1

page 5

research
11/25/2019

ColorFool: Semantic Adversarial Colorization

Adversarial attacks that generate small L_p-norm perturbations to mislea...
research
07/08/2021

Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Adversarial examples pose a threat to deep neural network models in a va...
research
08/22/2021

Robustness-via-Synthesis: Robust Training with Generative Adversarial Perturbations

Upon the discovery of adversarial attacks, robust models have become obl...
research
10/16/2019

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

Natural images are virtually surrounded by low-density misclassified reg...
research
05/16/2022

Diffusion Models for Adversarial Purification

Adversarial purification refers to a class of defense methods that remov...
research
03/21/2018

Adversarial Defense based on Structure-to-Signal Autoencoders

Adversarial attack methods have demonstrated the fragility of deep neura...
research
07/26/2021

Adversarial Attacks with Time-Scale Representations

We propose a novel framework for real-time black-box universal attacks w...

Please sign up or login with your details

Forgot password? Click here to reset