DeepAI AI Chat
Log In Sign Up

Minimal Explanations for Neural Network Predictions

by   Ouns El Harzli, et al.

Explaining neural network predictions is known to be a challenging problem. In this paper, we propose a novel approach which can be effectively exploited, either in isolation or in combination with other methods, to enhance the interpretability of neural model predictions. For a given input to a trained neural model, our aim is to compute a smallest set of input features so that the model prediction changes when these features are disregarded by setting them to an uninformative baseline value. While computing such minimal explanations is computationally intractable in general for fully-connected neural networks, we show that the problem becomes solvable in polynomial time by a greedy algorithm under mild assumptions on the network's activation functions. We then show that our tractability result extends seamlessly to more advanced neural architectures such as convolutional and graph neural networks. We conduct experiments to showcase the capability of our method for identifying the input features that are essential to the model's prediction.


CF-GNNExplainer: Counterfactual Explanations for Graph Neural Networks

Graph neural networks (GNNs) have shown increasing promise in real-world...

Towards Automated Evaluation of Explanations in Graph Neural Networks

Explaining Graph Neural Networks predictions to end users of AI applicat...

Explaining Away Attacks Against Neural Networks

We investigate the problem of identifying adversarial attacks on image-b...

Robust Counterfactual Explanations on Graph Neural Networks

Massive deployment of Graph Neural Networks (GNNs) in high-stake applica...

Global Explanations of Neural Networks: Mapping the Landscape of Predictions

A barrier to the wider adoption of neural networks is their lack of inte...

Learning from Randomly Initialized Neural Network Features

We present the surprising result that randomly initialized neural networ...

How to Explain Neural Networks: A perspective of data space division

Interpretability of intelligent algorithms represented by deep learning ...