Towards Hiding Adversarial Examples from Network Interpretation

12/06/2018
by   Akshayvarun Subramanya, et al.
0

Deep networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches can be highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of prediction. We show that our algorithms can empower adversarial patches, by hiding them from network interpretation tools. We believe our algorithms can facilitate developing more robust network interpretation tools that truly explain the network's underlying decision making process.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

research
05/05/2020

Adversarial Training against Location-Optimized Adversarial Patches

Deep neural networks have been shown to be susceptible to adversarial ex...
research
12/27/2017

Adversarial Patch

We present a method to create universal, robust, targeted adversarial im...
research
06/26/2020

Proper Network Interpretability Helps Adversarial Robustness in Classification

Recent works have empirically shown that there exist adversarial example...
research
08/01/2023

Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches

Autonomous flying robots, such as multirotors, often rely on deep learni...
research
11/30/2019

Design and Interpretation of Universal Adversarial Patches in Face Detection

We consider universal adversarial patches for faces - small visual eleme...
research
06/19/2023

Eigenpatches – Adversarial Patches from Principal Components

Adversarial patches are still a simple yet powerful white box attack tha...
research
02/06/2019

Fooling Neural Network Interpretations via Adversarial Model Manipulation

We ask whether the neural network interpretation methods can be fooled v...

Please sign up or login with your details

Forgot password? Click here to reset