Attack to Fool and Explain Deep Networks

06/20/2021
by   Naveed Akhtar, et al.
0

Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We first propose an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We show that the proposed attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we alter the `adversarial' objective of our attack to use it as a tool to `explain' deep visual representation. We show that by careful channeling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. Finally, we exploit the explanability properties of our perturbations to perform image generation, inpainting and interactive image manipulation by attacking adversarialy robust `classifiers'.In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models. The article also makes secondary contributions in terms of establishing the utility of our attack beyond the adversarial objective with multiple interesting applications.

READ FULL TEXT

page 2

page 7

page 9

page 10

page 12

page 13

page 14

page 16

research
03/03/2017

Adversarial Examples for Semantic Image Segmentation

Machine learning methods in general and Deep Neural Networks in particul...
research
02/06/2020

AI-GAN: Attack-Inspired Generation of Adversarial Examples

Adversarial examples that can fool deep models are mainly crafted by add...
research
05/27/2019

Label Universal Targeted Attack

We introduce Label Universal Targeted Attack (LUTA) that makes a deep mo...
research
06/19/2018

Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

Designing models that are robust to small adversarial perturbations of t...
research
04/02/2019

Adversarial Attacks against Deep Saliency Models

Currently, a plethora of saliency models based on deep neural networks h...
research
07/18/2018

Harmonic Adversarial Attack Method

Adversarial attacks find perturbations that can fool models into misclas...
research
06/08/2019

Sensitivity of Deep Convolutional Networks to Gabor Noise

Deep Convolutional Networks (DCNs) have been shown to be sensitive to Un...

Please sign up or login with your details

Forgot password? Click here to reset