Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

06/22/2020
by   Cassidy Laidlaw, et al.
15

We present adversarial attacks and defenses for the perceptual adversarial threat model: the set of all perturbations to natural images which can mislead a classifier but are imperceptible to human eyes. The perceptual threat model is broad and encompasses L_2, L_∞, spatial, and many other existing adversarial threat models. However, it is difficult to determine if an arbitrary perturbation is imperceptible without humans in the loop. To solve this issue, we propose to use a neural perceptual distance, an approximation of the true perceptual distance between images using internal activations of neural networks. In particular, we use the Learned Perceptual Image Patch Similarity (LPIPS) distance. We then propose the neural perceptual threat model that includes adversarial examples with a bounded neural perceptual distance to natural images. Under the neural perceptual threat model, we develop two novel perceptual adversarial attacks to find any imperceptible perturbations to images which can fool a classifier. Through an extensive perceptual study, we show that the LPIPS distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Because the LPIPS threat model is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against 12 types of adversarial attacks and find that, for each attack, PAT achieves close to the accuracy of adversarial training against just that perturbation type. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial defense with this property.

READ FULL TEXT
research
12/12/2021

Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses

Adversarial training (AT) is considered to be one of the most reliable d...
research
02/21/2019

Quantifying Perceptual Distortion of Adversarial Examples

Recent work has shown that additive threat models, which only permit the...
research
10/23/2019

Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks

In the last couple of years, several adversarial attack methods based on...
research
05/29/2019

Functional Adversarial Attacks

We propose functional adversarial attacks, a novel class of threat model...
research
08/30/2021

Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

Adversarial robustness of deep models is pivotal in ensuring safe deploy...
research
10/15/2021

Adversarial Purification through Representation Disentanglement

Deep learning models are vulnerable to adversarial examples and make inc...
research
07/17/2022

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Deep Neural Networks (DNNs) are highly sensitive to imperceptible malici...

Please sign up or login with your details

Forgot password? Click here to reset