Divide-and-Conquer Adversarial Detection

05/27/2019
by   Xuwang Yin, et al.
0

The vulnerabilities of deep neural networks against adversarial examples have become a major concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples have been shown to be only effective when the attacker is oblivious to the detection mechanism, i.e., in non-adaptive attacks. In this paper, we propose an effective and practical method for detecting adaptive/dynamic adversaries. In short, we train adversary-robust auxiliary detectors to discriminate in-class natural examples from adversarially crafted out-of-class examples. To identify a potential adversary, we first obtain the estimated class of the input using the classification system, and then use the corresponding detector to verify whether the input is a natural example of that class, or is an adversarially manipulated example. Experimental results on MNIST and CIFAR10 dataset show that our method could withstand adaptive PGD attacks. Furthermore, we demonstrate that with our novel training scheme our models learn significant more robust representation than ordinary adversarial training.

READ FULL TEXT

page 9

page 10

research
10/23/2018

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

It has been shown that adversaries can craft example inputs to neural ne...
research
03/22/2021

ExAD: An Ensemble Approach for Explanation-based Adversarial Detection

Recent research has shown Deep Neural Networks (DNNs) to be vulnerable t...
research
07/13/2022

Game of Trojans: A Submodular Byzantine Approach

Machine learning models in the wild have been shown to be vulnerable to ...
research
12/07/2020

Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection

Although deep neural networks have shown promising performances on vario...
research
08/29/2023

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Adversarial example detection plays a vital role in adaptive cyber defen...
research
11/22/2018

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Neural network based classifiers are still prone to manipulation through...
research
12/11/2022

Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors

Although machine learning based algorithms have been extensively used fo...

Please sign up or login with your details

Forgot password? Click here to reset