Foiling Explanations in Deep Neural Networks

11/27/2022
by   Snir Vitrack Tamam, et al.
0

Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone – sans a reasoning of how said answer was derived – is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image – hardly influencing the network's output – we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets – CIFAR100 and ImageNet – using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.

READ FULL TEXT

page 2

page 3

page 11

page 12

page 13

page 14

research
08/18/2023

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Attribution methods shed light on the explainability of data-driven appr...
research
08/05/2019

NeuroMask: Explaining Predictions of Deep Neural Networks through Mask Learning

Deep Neural Networks (DNNs) deliver state-of-the-art performance in many...
research
11/26/2017

An Introduction to Deep Visual Explanation

The practical impact of deep learning on complex supervised learning pro...
research
08/17/2022

An Evolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm for Generating Adversarial Instances in Deep Networks

Deep neural networks (DNNs) are sensitive to adversarial data in a varie...
research
06/19/2018

RISE: Randomized Input Sampling for Explanation of Black-box Models

Deep neural networks are increasingly being used to automate data analys...
research
12/18/2022

Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Deep learning has revolutionized human society, yet the black-box nature...
research
05/17/2023

Explain Any Concept: Segment Anything Meets Concept-Based Explanation

EXplainable AI (XAI) is an essential topic to improve human understandin...

Please sign up or login with your details

Forgot password? Click here to reset