Gradient Similarity: An Explainable Approach to Detect Adversarial Attacks against Deep Learning

06/27/2018
by   Jasjeet Dhaliwal, et al.
0

Deep neural networks are susceptible to small-but-specific adversarial perturbations capable of deceiving the network. This vulnerability can lead to potentially harmful consequences in security-critical applications. To address this vulnerability, we propose a novel metric called Gradient Similarity that allows us to capture the influence of training data on test inputs. We show that Gradient Similarity behaves differently for normal and adversarial inputs, and enables us to detect a variety of adversarial attacks with a near perfect ROC-AUC of 95-100%. Even white-box adversaries equipped with perfect knowledge of the system cannot bypass our detector easily. On the MNIST dataset, white-box attacks are either detected with a high ROC-AUC of 87-96%, or require very high distortion to bypass our detector.

READ FULL TEXT

page 7

page 9

research
02/08/2017

Adversarial Attacks on Neural Network Policies

Machine learning classifiers are known to be vulnerable to inputs malici...
research
02/02/2023

TransFool: An Adversarial Attack against Neural Machine Translation Models

Deep neural networks have been shown to be vulnerable to small perturbat...
research
03/27/2023

EMShepherd: Detecting Adversarial Samples via Side-channel Leakage

Deep Neural Networks (DNN) are vulnerable to adversarial perturbations-s...
research
02/11/2020

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

Vulnerability to adversarial attacks is one of the principal hurdles to ...
research
03/09/2023

NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks

Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in va...
research
10/16/2019

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

Natural images are virtually surrounded by low-density misclassified reg...
research
12/07/2018

Deep-RBF Networks Revisited: Robust Classification with Rejection

One of the main drawbacks of deep neural networks, like many other class...

Please sign up or login with your details

Forgot password? Click here to reset