Discriminative Attribution from Counterfactuals

09/28/2021
by   Nils Eckstein, et al.
0

We present a method for neural network interpretability by combining feature attribution with counterfactual explanations to generate attribution maps that highlight the most discriminative features between pairs of classes. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner, thus preventing potential observer bias. We evaluate the proposed method on three diverse datasets, including a challenging artificial dataset and real-world biological data. We show quantitatively and qualitatively that the highlighted features are substantially more discriminative than those extracted using conventional attribution methods and argue that this type of explanation is better suited for understanding fine grained class differences as learned by a deep neural network.

READ FULL TEXT

page 9

page 16

page 17

page 18

page 19

page 20

page 21

page 22

research
11/10/2020

Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

To explain a machine learning model, there are two main approaches: feat...
research
04/19/2021

Improving Attribution Methods by Learning Submodular Functions

This work explores the novel idea of learning a submodular scoring funct...
research
10/28/2020

Attribution Preservation in Network Compression for Reliable Network Interpretation

Neural networks embedded in safety-sensitive applications such as self-d...
research
10/04/2021

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

One principal approach for illuminating a black-box neural network is fe...
research
02/06/2019

Global Explanations of Neural Networks: Mapping the Landscape of Predictions

A barrier to the wider adoption of neural networks is their lack of inte...
research
05/19/2022

Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

There is broad agreement in the literature that explanation methods shou...
research
11/24/2017

Visual Feature Attribution using Wasserstein GANs

Attributing the pixels of an input image to a certain category is an imp...

Please sign up or login with your details

Forgot password? Click here to reset