What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

01/24/2022
by   Yijun Yang, et al.
2

Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference result to a generator to obtain a synthetic output and then comparing it against the original input. For legitimate inputs that are correctly inferred, the synthetic output tries to reconstruct the input. On the contrary, for AEs, instead of reconstructing the input, the synthetic output would be created to conform to the wrong label whenever possible. Consequently, by measuring the distance between the input and the synthetic output with metric learning, we can differentiate AEs from legitimate inputs. We perform comprehensive evaluations under various AE attack scenarios, and experimental results show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks. Moreover, our analysis shows that successful AEs that can bypass ContraNet tend to have much-weakened adversarial semantics. We have also shown that ContraNet can be easily combined with adversarial training techniques to achieve further improved AE defense capabilities.

READ FULL TEXT

page 3

page 6

page 7

page 9

page 14

page 15

page 16

page 17

research
04/20/2021

MixDefense: A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

Machine learning with deep neural networks (DNNs) has become one of the ...
research
02/13/2022

Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial Examples to Triggered Samples

Deep neural networks (DNNs) are known to be vulnerable to backdoor attac...
research
02/10/2021

Detecting Localized Adversarial Examples: A Generic Approach using Critical Region Analysis

Deep neural networks (DNNs) have been applied in a wide range of applica...
research
08/26/2019

A Statistical Defense Approach for Detecting Adversarial Examples

Adversarial examples are maliciously modified inputs created to fool dee...
research
07/25/2022

p-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations

The lack of well-calibrated confidence estimates makes neural networks i...
research
09/04/2023

Adv3D: Generating 3D Adversarial Examples in Driving Scenarios with NeRF

Deep neural networks (DNNs) have been proven extremely susceptible to ad...
research
08/29/2023

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Adversarial example detection plays a vital role in adaptive cyber defen...

Please sign up or login with your details

Forgot password? Click here to reset