DeepAI AI Chat
Log In Sign Up

Influence-Directed Explanations for Deep Convolutional Networks

by   Klas Leino, et al.

We study the problem of explaining a rich class of behavioral properties of deep neural networks. Distinctively, our influence-directed explanations approach this problem by peering inside the net- work to identify neurons with high influence on the property and distribution of interest using an axiomatically justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by training convolutional neural net- works on MNIST, ImageNet, Pubfig, and Diabetic Retinopathy datasets. Our evaluation demonstrates that influence-directed explanations (1) identify influential concepts that generalize across instances, (2) help extract the essence of what the network learned about a class, (3) isolate individual features the network uses to make decisions and distinguish related instances, and (4) assist in understanding misclassifications.


page 1

page 5


Understanding Individual Decisions of CNNs via Contrastive Backpropagation

A number of backpropagation-based approaches such as DeConvNets, vanilla...

Concept-based Explanations for Out-Of-Distribution Detectors

Out-of-distribution (OOD) detection plays a crucial role in ensuring the...

Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations

Deep neural networks are complex and opaque. As they enter application i...

Compositional Explanations of Neurons

We describe a procedure for explaining neurons in deep representations b...

Deephys: Deep Electrophysiology, Debugging Neural Networks under Distribution Shifts

Deep Neural Networks (DNNs) often fail in out-of-distribution scenarios....

GAN-based Generation and Automatic Selection of Explanations for Neural Networks

One way to interpret trained deep neural networks (DNNs) is by inspectin...