Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation

03/26/2021
by   Dohun Lim, et al.
0

We present a novel method for reliably explaining the predictions of neural networks. We consider an explanation reliable if it identifies input features relevant to the model output by considering the input and the neighboring data points. Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction: locally consistent loss and gradient profile. A theoretical analysis established in this study suggests that those locally smooth model explanations are learned using a batch of noisy copies of the input with the L1 regularization for a saliency map. Extensive experiments support the analysis results, revealing that the proposed saliency maps retrieve the original classes of adversarial examples crafted against both naturally and adversarially trained models, significantly outperforming previous methods. We further demonstrated that such good performance results from the learning capability of this method to identify input features that are truly relevant to the model output of the input and the neighboring data points, fulfilling the requirements of a reliable explanation.

READ FULL TEXT

page 21

page 22

page 23

page 24

page 25

page 26

page 27

page 28

research
05/31/2022

Exact Feature Collisions in Neural Networks

Predictions made by deep neural networks were shown to be highly sensiti...
research
06/08/2021

Investigating sanity checks for saliency maps with image and text classification

Saliency maps have shown to be both useful and misleading for explaining...
research
05/04/2023

Neighboring Words Affect Human Interpretation of Saliency Explanations

Word-level saliency explanations ("heat maps over words") are often used...
research
06/14/2022

When adversarial attacks become interpretable counterfactual explanations

We argue that, when learning a 1-Lipschitz neural network with the dual ...
research
10/12/2020

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

There is a recent surge of interest in using attention as explanation of...
research
11/06/2022

ViT-CX: Causal Explanation of Vision Transformers

Despite the popularity of Vision Transformers (ViTs) and eXplainable AI ...
research
03/01/2023

SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective

Researchers have proposed various methods for visually interpreting the ...

Please sign up or login with your details

Forgot password? Click here to reset