ExAD: An Ensemble Approach for Explanation-based Adversarial Detection

03/22/2021
by   Raj Vardhan, et al.
13

Recent research has shown Deep Neural Networks (DNNs) to be vulnerable to adversarial examples that induce desired misclassifications in the models. Such risks impede the application of machine learning in security-sensitive domains. Several defense methods have been proposed against adversarial attacks to detect adversarial examples at test time or to make machine learning models more robust. However, while existing methods are quite effective under blackbox threat model, where the attacker is not aware of the defense, they are relatively ineffective under whitebox threat model, where the attacker has full knowledge of the defense. In this paper, we propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques. Each explanation technique in ExAD produces an explanation map identifying the relevance of input variables for the model's classification. For every class in a dataset, the system includes a detector network, corresponding to each explanation technique, which is trained to distinguish between normal and abnormal explanation maps. At test time, if the explanation map of an input is detected as abnormal by any detector model of the classified class, then we consider the input to be an adversarial example. We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets. Our extensive evaluation shows that our mechanism can effectively detect these attacks under blackbox threat model with limited false-positives. Furthermore, we find that our approach achieves promising results in limiting the success rate of whitebox attacks.

READ FULL TEXT

page 1

page 4

page 10

page 13

page 14

page 15

research
07/09/2021

Learning to Detect Adversarial Examples Based on Class Scores

Given the increasing threat of adversarial attacks on deep neural networ...
research
07/13/2020

A simple defense against adversarial attacks on heatmap explanations

With machine learning models being used for more sensitive applications,...
research
05/27/2019

Divide-and-Conquer Adversarial Detection

The vulnerabilities of deep neural networks against adversarial examples...
research
01/06/2018

Adversarial Perturbation Intensity Achieving Chosen Intra-Technique Transferability Level for Logistic Regression

Machine Learning models have been shown to be vulnerable to adversarial ...
research
11/18/2018

Regularized adversarial examples for model interpretability

As machine learning algorithms continue to improve, there is an increasi...
research
09/19/2023

Adversarial Attacks Against Uncertainty Quantification

Machine-learning models can be fooled by adversarial examples, i.e., car...
research
04/18/2022

Sardino: Ultra-Fast Dynamic Ensemble for Secure Visual Sensing at Mobile Edge

Adversarial example attack endangers the mobile edge systems such as veh...

Please sign up or login with your details

Forgot password? Click here to reset