Boundary Attributions Provide Normal (Vector) Explanations

03/20/2021
by   Zifan Wang, et al.
8

Recent work on explaining Deep Neural Networks (DNNs) focuses on attributing the model's output scores to input features. However, when it comes to classification problems, a more fundamental question is how much does each feature contributes to the model's decision to classify an input instance into a specific class. Our first contribution is Boundary Attribution, a new explanation method to address this question. BA leverages an understanding of the geometry of activation regions. Specifically, they involve computing (and aggregating) normal vectors of the local decision boundaries for the target input. Our second contribution is a set of analytical results connecting the adversarial robustness of the network and the quality of gradient-based explanations. Specifically, we prove two theorems for ReLU networks: BA of randomized smoothed networks or robustly trained networks is much closer to non-boundary attribution methods than that in standard networks. These analytics encourage users to improve model robustness for high-quality explanations. Finally, we evaluate the proposed methods on ImageNet and show BAs produce more concentrated and sharper visualizations compared with non-boundary ones. We further demonstrate that our method also helps to reduce the sensitivity of attributions to the baseline input if one is required.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 14

research
06/11/2020

Smoothed Geometry for Robust Attribution

Feature attributions are a popular tool for explaining the behavior of D...
research
06/07/2022

Fooling Explanations in Text Classifiers

State-of-the-art text classification models are becoming increasingly re...
research
06/11/2018

A Note about: Local Explanation Methods for Deep Neural Networks lack Sensitivity to Parameter Values

Local explanation methods, also known as attribution methods, attribute ...
research
12/07/2020

Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations

The clear transparency of Deep Neural Networks (DNNs) is hampered by com...
research
10/24/2022

Generating Hierarchical Explanations on Text Classification Without Connecting Rules

The opaqueness of deep NLP models has motivated the development of metho...
research
06/12/2023

On the Robustness of Removal-Based Feature Attributions

To explain complex models based on their inputs, many feature attributio...
research
10/01/2021

LEMON: Explainable Entity Matching

State-of-the-art entity matching (EM) methods are hard to interpret, and...

Please sign up or login with your details

Forgot password? Click here to reset