The Manifold Hypothesis for Gradient-Based Explanations

06/15/2022
by   Sebastian Bordt, et al.
24

When do gradient-based explanation algorithms provide meaningful explanations? We propose a necessary criterion: their feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of different datasets – MNIST, EMNIST, CIFAR10, X-ray pneumonia and Diabetic Retinopathy detection – we demonstrate that the more a feature attribution is aligned with the tangent space of the data, the more structured and explanatory it tends to be. In particular, the attributions provided by popular post-hoc methods such as Integrated Gradients, SmoothGrad and Input × Gradient tend to be more strongly aligned with the data manifold than the raw gradient. As a consequence, we suggest that explanation algorithms should actively strive to align their explanations with the data manifold. In part, this can be achieved by adversarial training, which leads to better alignment across all datasets. Some form of adjustment to the model architecture or training algorithm is necessary, since we show that generalization of neural networks alone does not imply the alignment of model gradients with the data manifold.

READ FULL TEXT

page 8

page 23

page 24

page 27

page 28

page 30

page 35

page 36

research
05/30/2023

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

One of the remarkable properties of robust computer vision models is tha...
research
05/29/2021

EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment

Recent years have seen the introduction of a range of methods for post-h...
research
12/16/2022

Robust Explanation Constraints for Neural Networks

Post-hoc explanation methods are used with the intent of providing insig...
research
06/29/2022

Private Graph Extraction via Feature Explanations

Privacy and interpretability are two of the important ingredients for ac...
research
08/31/2021

Discretized Integrated Gradients for Explaining Language Models

As a prominent attribution-based explanation algorithm, Integrated Gradi...
research
02/11/2021

What does LIME really see in images?

The performance of modern algorithms on certain computer vision tasks su...
research
06/18/2021

NoiseGrad: enhancing explanations by introducing stochasticity to model weights

Attribution methods remain a practical instrument that is used in real-w...

Please sign up or login with your details

Forgot password? Click here to reset