Explaining Neural Networks via Perturbing Important Learned Features

11/25/2019
by   Ashkan Khakzar, et al.
54

Attributing the output of a neural network to the contribution of given input elements is one way of shedding light on the black box nature of neural networks. We propose a novel input feature attribution method that finds an input perturbation that maximally changes the output neuron by exclusively perturbing important hidden neurons (i.e. learned features) on the path to output neuron. Given an input, this is achieved by 1) pruning unimportant neurons, and subsequently 2) finding a local input perturbation that maximizes the output in the pruned network. Since our method considers the importance of hidden neurons (high-level features), it inherently considers interdependencies between multiple input elements, which is vital for input feature attribution. We propose PruneGrad, an efficient gradient-based solution for the pruning and perturbation steps of our method. The efficacy of our method is evaluated by quantitatively benchmarking against other attribution methods using 1) sanity checks, 2) pixel perturbation, and 3) Remove and Retrain (ROAR). Our results show that while most of the existing attribution methods are prone to fail or get mediocre results in at least one benchmark, our proposed method achieves state of the art results in all three benchmarks. The results are further supported by comparative visual evaluation.

READ FULL TEXT

page 5

page 6

page 8

page 11

page 12

page 13

page 14

page 17

research
03/31/2022

Improving Adversarial Transferability via Neuron Attribution-Based Attacks

Deep neural networks (DNNs) are known to be vulnerable to adversarial ex...
research
03/31/2021

Neural Response Interpretation through the Lens of Critical Pathways

Is critical input information encoded in specific sparse pathways within...
research
04/10/2017

Learning Important Features Through Propagating Activation Differences

The purported "black box"' nature of neural networks is a barrier to ado...
research
07/08/2022

SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance

The leap in performance in state-of-the-art computer vision methods is a...
research
05/30/2018

How Important Is a Neuron?

The problem of attributing a deep network's prediction to its input/base...
research
10/04/2021

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

One principal approach for illuminating a black-box neural network is fe...
research
05/01/2020

A Comprehensive Study on Visual Explanations for Spatio-temporal Networks

Identifying and visualizing regions that are significant for a given dee...

Please sign up or login with your details

Forgot password? Click here to reset