Explaining Away Attacks Against Neural Networks

03/06/2020
by   Sean Saito, et al.
0

We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks.

READ FULL TEXT
research
02/22/2021

Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

We consider the problem of the stability of saliency-based explanations ...
research
12/11/2018

Adversarial Framing for Image and Video Classification

Neural networks are prone to adversarial attacks. In general, such attac...
research
07/01/2021

DVS-Attacks: Adversarial Attacks on Dynamic Vision Sensors for Spiking Neural Networks

Spiking Neural Networks (SNNs), despite being energy-efficient when impl...
research
01/17/2023

Denoising Diffusion Probabilistic Models as a Defense against Adversarial Attacks

Neural Networks are infamously sensitive to small perturbations in their...
research
11/24/2022

SAGA: Spectral Adversarial Geometric Attack on 3D Meshes

A triangular mesh is one of the most popular 3D data representations. As...
research
05/19/2022

Minimal Explanations for Neural Network Predictions

Explaining neural network predictions is known to be a challenging probl...
research
10/21/2019

Recovering Localized Adversarial Attacks

Deep convolutional neural networks have achieved great successes over re...

Please sign up or login with your details

Forgot password? Click here to reset