Unsupervised Detection of Adversarial Examples with Model Explanations

07/22/2021
by   Gihyuk Ko, et al.
0

Deep Neural Networks (DNNs) have shown remarkable performance in a diverse range of machine learning applications. However, it is widely known that DNNs are vulnerable to simple adversarial perturbations, which causes the model to incorrectly classify inputs. In this paper, we propose a simple yet effective method to detect adversarial examples, using methods developed to explain the model's behavior. Our key observation is that adding small, humanly imperceptible perturbations can lead to drastic changes in the model explanations, resulting in unusual or irregular forms of explanations. From this insight, we propose an unsupervised detection of adversarial examples using reconstructor networks trained only on model explanations of benign examples. Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples generated by the state-of-the-art algorithms with high confidence. To the best of our knowledge, this work is the first in suggesting unsupervised defense method using model explanations.

READ FULL TEXT
research
02/21/2018

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch

Deep neural networks (DNNs) have shown phenomenal success in a wide rang...
research
08/06/2019

Explaining Deep Neural Networks Using Spectrum-Based Fault Localization

Deep neural networks (DNNs) increasingly replace traditionally developed...
research
05/19/2020

On Intrinsic Dataset Properties for Adversarial Machine Learning

Deep neural networks (DNNs) have played a key role in a wide range of ma...
research
12/02/2020

Towards Imperceptible Adversarial Image Patches Based on Network Explanations

The vulnerability of deep neural networks (DNNs) for adversarial example...
research
07/31/2020

TEAM: We Need More Powerful Adversarial Examples for DNNs

Although deep neural networks (DNNs) have achieved success in many appli...
research
01/04/2018

Facial Attributes: Accuracy and Adversarial Robustness

Facial attributes, emerging soft biometrics, must be automatically and r...
research
03/13/2018

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Deep neural networks (DNNs) enable innovative applications of machine le...

Please sign up or login with your details

Forgot password? Click here to reset