Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

07/05/2019
by   Yao Qin, et al.
5

Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. Most of the proposed methods for mitigating adversarial examples have subsequently been defeated by stronger attacks. Motivated by these issues, we take a different approach and propose to instead detect adversarial examples based on class-conditional reconstructions of the input. Our method uses the reconstruction network proposed as part of Capsule Networks (CapsNets), but is general enough to be applied to standard convolutional networks. We find that adversarial or otherwise corrupted images result in much larger reconstruction errors than normal inputs, prompting a simple detection method by thresholding the reconstruction error. Based on these findings, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. While this attack produces undetected adversarial examples, we find that for CapsNets the resulting perturbations can cause the images to appear visually more like the target class. This suggests that CapsNets utilize features that are more aligned with human perception and address the central issue raised by adversarial examples.

READ FULL TEXT

page 7

page 11

page 14

page 17

page 18

page 19

page 20

research
02/19/2021

Effective and Efficient Vote Attack on Capsule Networks

Standard Convolutional Neural Networks (CNNs) can be easily fooled by im...
research
11/16/2018

DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules

We present a simple technique that allows capsule models to detect adver...
research
01/28/2019

CapsAttacks: Robust and Imperceptible Adversarial Attacks on Capsule Networks

Capsule Networks envision an innovative point of view about the represen...
research
12/22/2016

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

Deep learning has greatly improved visual recognition in recent years. H...
research
12/24/2022

Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty

The task of out-of-distribution (OOD) detection is vital to realize safe...
research
06/26/2023

Are aligned neural networks adversarially aligned?

Large language models are now tuned to align with the goals of their cre...
research
03/11/2021

DAFAR: Defending against Adversaries by Feedback-Autoencoder Reconstruction

Deep learning has shown impressive performance on challenging perceptual...

Please sign up or login with your details

Forgot password? Click here to reset