When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures

09/08/2019
by   Gil Fidel, et al.
0

State-of-the-art deep neural networks (DNNs) are highly effective in solving many complex real-world problems. However, these models are vulnerable to adversarial perturbation attacks, and despite the plethora of research in this domain, to this day, adversaries still have the upper hand in the cat and mouse game of adversarial example generation methods vs. detection and prevention methods. In this research, we present a novel detection method that uses Shapley Additive Explanations (SHAP) values computed for the internal layers of a DNN classifier to discriminate between normal and adversarial inputs. We evaluate our method by building an extensive dataset of adversarial examples over the popular CIFAR-10 and MNIST datasets, and training a neural network-based detector to distinguish between normal and adversarial inputs. We evaluate our detector against adversarial examples generated by diverse state-of-the-art attacks and demonstrate its high detection accuracy and strong generalization ability to adversarial inputs generated with different attack methods.

READ FULL TEXT

page 5

page 7

research
07/16/2019

Latent Adversarial Defence with Boundary-guided Generation

Deep Neural Networks (DNNs) have recently achieved great success in many...
research
05/03/2023

New Adversarial Image Detection Based on Sentiment Analysis

Deep Neural Networks (DNNs) are vulnerable to adversarial examples, whil...
research
11/22/2018

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Neural network based classifiers are still prone to manipulation through...
research
09/15/2019

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Deep neural networks (DNNs) are notorious for their vulnerability to adv...
research
01/05/2021

Noise Sensitivity-Based Energy Efficient and Robust Adversary Detection in Neural Networks

Neural networks have achieved remarkable performance in computer vision,...
research
01/08/2018

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Deep Neural Networks (DNNs) have recently been shown to be vulnerable ag...
research
03/11/2021

DAFAR: Defending against Adversaries by Feedback-Autoencoder Reconstruction

Deep learning has shown impressive performance on challenging perceptual...

Please sign up or login with your details

Forgot password? Click here to reset