Towards Explanation of DNN-based Prediction with Guided Feature Inversion

by   Mengnan Du, et al.

While deep neural networks (DNN) have become an effective computational tool, the prediction results are often criticized by the lack of interpretability, which is essential in many real-world applications such as health informatics. Existing attempts based on local interpretations aim to identify relevant features contributing the most to the prediction of DNN by monitoring the neighborhood of a given input. They usually simply ignore the intermediate layers of the DNN that might contain rich information for interpretation. To bridge the gap, in this paper, we propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of DNN models. By further interacting with the neuron of the target category at the output layer of the DNN, we enforce the interpretation result to be class-discriminative. We apply the proposed interpretation model to different CNN architectures to provide explanations for image data and conduct extensive experiments on ImageNet and PASCAL VOC07 datasets. The interpretation results demonstrate the effectiveness of our proposed framework in providing class-discriminative interpretation for DNN-based prediction.


page 6

page 7

page 9


DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data

For sake of reliability, it is necessary for models in real-world applic...

Mixture of Linear Models Co-supervised by Deep Neural Networks

Deep neural network (DNN) models have achieved phenomenal success for ap...

Interpretable Deep Learning under Fire

Providing explanations for complicated deep neural network (DNN) models ...

Detecting Anomalous Inputs to DNN Classifiers By Joint Statistical Testing at the Layers

Detecting anomalous inputs, such as adversarial and out-of-distribution ...

Layer-Wise Interpretation of Deep Neural Networks Using Identity Initialization

The interpretability of neural networks (NNs) is a challenging but essen...

Accelerating Shapley Explanation via Contributive Cooperator Selection

Even though Shapley value provides an effective explanation for a DNN mo...

Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation

With the wide use of deep neural networks (DNN), model interpretability ...