Defense against Universal Adversarial Perturbations

by   Naveed Akhtar, et al.

Recent advances in Deep Learning show the existence of image-agnostic quasi-imperceptible perturbations that when applied to `any' image can fool a state-of-the-art network classifier to change its prediction about the image label. These `Universal Adversarial Perturbations' pose a serious threat to the success of Deep Learning in practice. We present the first dedicated framework to effectively defend the networks against such perturbations. Our approach learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a targeted model, such that the targeted model needs no modification. The PRN is learned from real and synthetic image-agnostic perturbations, where an efficient method to compute the latter is also proposed. A perturbation detector is separately trained on the Discrete Cosine Transform of the input-output difference of the PRN. A query image is first passed through the PRN and verified by the detector. If a perturbation is detected, the output of the PRN is used for label prediction instead of the actual image. A rigorous evaluation shows that our framework can defend the network classifiers against unseen adversarial perturbations in the real-world scenarios with up to 97.5 success rate. The PRN also generalizes well in the sense that training for one targeted network defends another network with a comparable success rate.


page 1

page 5

page 7


Locally optimal detection of stochastic targeted universal adversarial perturbations

Deep learning image classifiers are known to be vulnerable to small adve...

Universal Adversarial Perturbation for Text Classification

Given a state-of-the-art deep neural network text classifier, we show th...

Adversarial Defense by Stratified Convolutional Sparse Coding

We propose an adversarial defense method that achieves state-of-the-art ...

Transferable Universal Adversarial Perturbations Using Generative Models

Deep neural networks tend to be vulnerable to adversarial perturbations,...

Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency

There has been a recent surge in research on adversarial perturbations t...

Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions

Deep learning models are susceptible to input specific noise, called adv...

On Lyapunov exponents and adversarial perturbation

In this paper, we would like to disseminate a serendipitous discovery in...

Please sign up or login with your details

Forgot password? Click here to reset