Attribution-driven Causal Analysis for Detection of Adversarial Examples

03/14/2019
by   Susmit Jha, et al.
6

Attribution methods have been developed to explain the decision of a machine learning model on a given input. We use the Integrated Gradient method for finding attributions to define the causal neighborhood of an input by incrementally masking high attribution features. We study the robustness of machine learning models on benign and adversarial inputs in this neighborhood. Our study indicates that benign inputs are robust to the masking of high attribution features but adversarial inputs generated by the state-of-the-art adversarial attack methods such as DeepFool, FGSM, CW and PGD, are not robust to such masking. Further, our study demonstrates that this concentration of high-attribution features responsible for the incorrect decision is more pronounced in physically realizable adversarial examples. This difference in attribution of benign and adversarial inputs can be used to detect adversarial examples. Such a defense approach is independent of training data and attack method, and we demonstrate its effectiveness on digital and physically realizable perturbations.

READ FULL TEXT

page 2

page 8

research
02/25/2022

ARIA: Adversarially Robust Image Attribution for Content Provenance

Image attribution – matching an image back to a trusted source – is an e...
research
06/08/2019

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Deep neural networks obtain state-of-the-art performance on a series of ...
research
05/29/2019

Misleading Authorship Attribution of Source Code using Adversarial Learning

In this paper, we present a novel attack against authorship attribution ...
research
05/15/2022

Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection

Model attributions are important in deep neural networks as they aid pra...
research
07/27/2020

Towards Accuracy-Fairness Paradox: Adversarial Example-based Data Augmentation for Visual Debiasing

Machine learning fairness concerns about the biases towards certain prot...
research
10/14/2020

FAR: A General Framework for Attributional Robustness

Attribution maps have gained popularity as tools for explaining neural n...
research
08/26/2021

Why Adversarial Reprogramming Works, When It Fails, and How to Tell the Difference

Adversarial reprogramming allows repurposing a machine-learning model to...

Please sign up or login with your details

Forgot password? Click here to reset