Explanations can be manipulated and geometry is to blame

06/19/2019
by   Ann-Kathrin Dombrowski, et al.
0

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

READ FULL TEXT
research
03/04/2022

Do Explanations Explain? Model Knows Best

It is a mystery which input features contribute to a neural network's ou...
research
12/16/2022

Robust Explanation Constraints for Neural Networks

Post-hoc explanation methods are used with the intent of providing insig...
research
04/22/2020

Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

The interest in complex deep neural networks for computer vision applica...
research
07/20/2020

Fairwashing Explanations with Off-Manifold Detergent

Explanation methods promise to make black-box classifiers more transpare...
research
05/14/2020

Distilling neural networks into skipgram-level decision lists

Several previous studies on explanation for recurrent neural networks fo...
research
12/18/2020

Towards Robust Explanations for Deep Neural Networks

Explanation methods shed light on the decision process of black-box clas...
research
04/13/2023

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance

Interpretability methods are valuable only if their explanations faithfu...

Please sign up or login with your details

Forgot password? Click here to reset