Universal Adversarial Training

11/27/2018
by   Ali Shafahi, et al.
0

Standard adversarial attacks change the predicted class label of an image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20 universal perturbation 13X faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game. This method is much faster and more scalable than conventional adversarial training with a strong adversary (PGD), and yet yields models that are extremely resistant to universal attacks, and comparably resistant to standard (per-instance) black box attacks. We also discover a rather fascinating side-effect of universal adversarial training: attacks built for universally robust models transfer better to other (black box) models than those built with conventional adversarial training.

READ FULL TEXT

page 4

page 5

page 6

page 8

research
01/29/2021

You Only Query Once: Effective Black Box Adversarial Attacks with Minimal Repeated Queries

Researchers have repeatedly shown that it is possible to craft adversari...
research
05/04/2020

On the Benefits of Models with Perceptually-Aligned Gradients

Adversarial robust models have been shown to learn more robust and inter...
research
09/20/2018

Playing the Game of Universal Adversarial Perturbations

We study the problem of learning classifiers robust to universal adversa...
research
01/27/2021

Meta Adversarial Training

Recently demonstrated physical-world adversarial attacks have exposed vu...
research
03/11/2020

Frequency-Tuned Universal Adversarial Attacks

Researchers have shown that the predictions of a convolutional neural ne...
research
04/24/2020

One Sparse Perturbation to Fool them All, almost Always!

Constructing adversarial perturbations for deep neural networks is an im...
research
09/15/2020

Decision-based Universal Adversarial Attack

A single perturbation can pose the most natural images to be misclassifi...

Please sign up or login with your details

Forgot password? Click here to reset