Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

by   Dan Valle, et al.

The interest in complex deep neural networks for computer vision applications is increasing. This leads to the need for improving the interpretable capabilities of these models. Recent explanation methods present visualizations of the relevance of pixels from input images, thus enabling the direct interpretation of properties of the input that lead to a specific output. These methods produce maps of pixel importance, which are commonly evaluated by visual inspection. This means that the effectiveness of an explanation method is assessed based on human expectation instead of actual feature importance. Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models. Specifically, our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way. We present a comparison between widely-known explanation methods using our proposed approach. Finally, we also propose a straightforward application of our approach to clean relevance maps, creating more interpretable maps without any loss in essential explanation (as per our proposed measure).


page 4

page 5

page 6

page 7


Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...

Towards Ground Truth Evaluation of Visual Explanations

Several methods have been proposed to explain the decisions of neural ne...

Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks

Learning-based representations have become the defacto means to address ...

RES: A Robust Framework for Guiding Visual Explanation

Despite the fast progress of explanation techniques in modern Deep Neura...

Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks

To verify and validate networks, it is essential to gain insight into th...

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

We study the effects of constrained optimization formulations and Frank-...

Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study

Existing interpretation algorithms have found that, even deep models mak...