Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations

04/22/2020
by   Dan Valle, et al.
14

The interest in complex deep neural networks for computer vision applications is increasing. This leads to the need for improving the interpretable capabilities of these models. Recent explanation methods present visualizations of the relevance of pixels from input images, thus enabling the direct interpretation of properties of the input that lead to a specific output. These methods produce maps of pixel importance, which are commonly evaluated by visual inspection. This means that the effectiveness of an explanation method is assessed based on human expectation instead of actual feature importance. Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models. Specifically, our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way. We present a comparison between widely-known explanation methods using our proposed approach. Finally, we also propose a straightforward application of our approach to clean relevance maps, creating more interpretable maps without any loss in essential explanation (as per our proposed measure).

READ FULL TEXT

page 4

page 5

page 6

page 7

research
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
research
03/16/2020

Towards Ground Truth Evaluation of Visual Explanations

Several methods have been proposed to explain the decisions of neural ne...
research
12/18/2017

Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks

Learning-based representations have become the defacto means to address ...
research
06/09/2023

Overcoming Adversarial Attacks for Human-in-the-Loop Applications

Including human analysis has the potential to positively affect the robu...
research
08/07/2019

Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks

To verify and validate networks, it is essential to gain insight into th...
research
10/15/2021

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

We study the effects of constrained optimization formulations and Frank-...
research
09/02/2021

Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study

Existing interpretation algorithms have found that, even deep models mak...

Please sign up or login with your details

Forgot password? Click here to reset