Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

09/15/2019
by   Gilad Cohen, et al.
6

Deep neural networks (DNNs) are notorious for their vulnerability to adversarial attacks, which are small perturbations added to their input images to mislead their prediction. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN's activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this correlation is much weaker for adversarial inputs. We train an adversarial detector using the k-NN ranks and distances and show that it successfully distinguishes adversarial examples, getting state-of-the-art results on four attack methods with three datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Learning to Detect Adversarial Examples Based on Class Scores

Given the increasing threat of adversarial attacks on deep neural networ...
research
09/08/2019

When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures

State-of-the-art deep neural networks (DNNs) are highly effective in sol...
research
09/19/2023

What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples

Adversarial examples, deliberately crafted using small perturbations to ...
research
11/22/2018

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Neural network based classifiers are still prone to manipulation through...
research
12/09/2020

KNN Classification with One-step Computation

KNN classification is a query triggered yet improvisational learning mod...
research
03/13/2018

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Deep neural networks (DNNs) enable innovative applications of machine le...
research
10/04/2019

Requirements for Developing Robust Neural Networks

Validation accuracy is a necessary, but not sufficient, measure of a neu...

Please sign up or login with your details

Forgot password? Click here to reset