DAFAR: Defending against Adversaries by Feedback-Autoencoder Reconstruction

03/11/2021
by   Haowen Liu, et al.
0

Deep learning has shown impressive performance on challenging perceptual tasks and has been widely used in software to provide intelligent services. However, researchers found deep neural networks vulnerable to adversarial examples. Since then, many methods are proposed to defend against adversaries in inputs, but they are either attack-dependent or shown to be ineffective with new attacks. And most of existing techniques have complicated structures or mechanisms that cause prohibitively high overhead or latency, impractical to apply on real software. We propose DAFAR, a feedback framework that allows deep learning models to detect/purify adversarial examples in high effectiveness and universality, with low area and time overhead. DAFAR has a simple structure, containing a victim model, a plug-in feedback network, and a detector. The key idea is to import the high-level features from the victim model's feature extraction layers into the feedback network to reconstruct the input. This data stream forms a feedback autoencoder. For strong attacks, it transforms the imperceptible attack on the victim model into the obvious reconstruction-error attack on the feedback autoencoder directly, which is much easier to detect; for weak attacks, the reformation process destroys the structure of adversarial examples. Experiments are conducted on MNIST and CIFAR-10 data-sets, showing that DAFAR is effective against popular and arguably most advanced attacks without losing performance on legitimate samples, with high effectiveness and universality across attack methods and parameters.

READ FULL TEXT

page 2

page 6

page 11

research
05/08/2021

Self-Supervised Adversarial Example Detection by Disentangled Representation

Deep learning models are known to be vulnerable to adversarial examples ...
research
03/02/2019

PuVAE: A Variational Autoencoder to Purify Adversarial Examples

Deep neural networks are widely used and exhibit excellent performance i...
research
09/08/2019

When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures

State-of-the-art deep neural networks (DNNs) are highly effective in sol...
research
12/30/2019

Defending from adversarial examples with a two-stream architecture

In recent years, deep learning has shown impressive performance on many ...
research
04/20/2023

Diversifying the High-level Features for better Adversarial Transferability

Given the great threat of adversarial attacks against Deep Neural Networ...
research
07/05/2019

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

Adversarial examples raise questions about whether neural network models...
research
10/14/2021

Adversarial examples by perturbing high-level features in intermediate decoder layers

We propose a novel method for creating adversarial examples. Instead of ...

Please sign up or login with your details

Forgot password? Click here to reset