1 Introduction
Deep neural networks have resulted in widespread and compelling advances in a variety of machine learning tasks such as object recognition, image segmentation, anomaly detection, machine translation, and synthesis. However, these advances have often been accompanied by a significant reduction in
interpretability, or the ability to visualize the flow of information being extracted at various layers of abstraction. In contrast to traditional rulebased learning methods (which search for specific, semantically handcrafted features or patterns), deep networks often produce decisions that are seemingly hard to decipher or justify for a given test data sample, even though their aggregate generalizability measured with respect to a holdout test dataset is excellent. This issue of unpacking the “blackbox” nature of deep networks has been identified as a key issue by several recent works Springenberg et al. (2014); Selvaraju et al. (2016); Shrikumar et al. (2017); Sundararajan et al. (2016).In this work, we focus on the task of object detection in images. Broadly, algorithms for interpreting the action of deep networks for this task can be grouped as follows: Classdiscriminative approaches, such as Class Activation Mappings (CAM) Zhou et al. (2016), or its gradientbased variant Selvaraju et al. (2016), produce a support in the original image domain that approximately corresponds to a given object class detected in that image. However, such methods are coarse and only produce lowresolution visualizations, and as such cannot be directly applied to very high resolution images. On the other hand, pixelspace gradientbased methods such as deconvolution networks Zeiler and Fergus (2014) and guided backpropagation Springenberg et al. (2014) produce finegrained features in a given image. However, gradient based methods suffer from either significant computational efficiency concerns, or are susceptible to saturation phenomena due to vanishing/exploding gradients, or both. In Shrikumar et al. (2017), this issue is alleviated by suitably using a second reference
input to stabilize the estimates. However, choosing this reference image is qualitative and can be challenging. Finally, modelagnostic approaches such as LIME
Ribeiro et al. (2016) are theoretically sound and can be applied for interpreting deep convolution networks, but involve solving challenging optimization problems.In this short paper, we outline a systematic framework for visualizing information flow in deep convolutional networks that resolves both the computational efficiency as well as the numerical robustness issues described above. We present several preliminary numerical results that support the benefits of our framework over existing methods.
At a high level, our approach is based on a novel forwardbackward scheme which operates as follows. Consider a trained deep convolutional network model and a given test image for which our model is able to identify the existence of a given target class. Then, our algorithm produces as output, a support (i.e., a subset of pixel locations) corresponding to the class predicted by our model in a manner similar to pixelspace gradient methods. However, in contrast with gradientbased approaches, our algorithm not only leverages the backward (class) information flow from the output layer(s) to the input, but also the the forward (image) information extracted at various layers of abstraction. See Figure 1.
More specifically, our method has the following distinguishing characteristics:

[nosep,leftmargin=*]

We propose a mathematically principled approach to achieve “backward information flow” within a deep convolutional network, leveraging the ideas proposed in the deconvolutional networks approach of Zeiler and Fergus (2014). However, this approach is computationally very expensive since it requires solving a sparse recovery problem for each layer of the network, and this limits the depth of a network on which this method is applicable. On the other hand, our approach only needs simple application of matrix adjoints and (elementwise) nonlinearities for each convolutional layer and can be easily implementable on very deep networks.

We propose a systematic way of using the forward information to guide the backwardtraversal. In particular, we use the forward information to extract a support within a given layer of representation that best corresponds to a specific feature map. We achieve this using a novel masking scheme which transparently combines both forward and backward information flows through the network.

As opposed to gradientbased schemes (such as Selvaraju et al. (2016); Springenberg et al. (2014)) that aggregate the information from all feature maps, our algorithm produces binary support estimates layer by layer. In that sense, our method avoids any numerical stability and robustness issues that may arise via the wellknown problem of exploding/vanishing gradients that can potentially affect the interpretability. In particular, in contrast with Shrikumar et al. (2017), we remove the need for any separate reference image, and our method only involves making two passes through the network for a given image.
We present preliminary numerical evidence supporting our method, and demonstrate advantages over gradientbased methods such as guided backpropagation
Springenberg et al. (2014).2 Proposed Approach: ForwardBackward Interpretability
We now describe our scheme for visualizing a convolutional neural network. We term our method
ForwardBackward Interpretability (or FBI for short). Given a test image, the goal is to identify important regions that explain the prediction of the learned network. To do this, we propagate the classprobed information back to the image pixel space through the complete network, using the guidance of learned model weights as well as the forwardactivations of each neuron in the network.
Our approach shares several similarities with the deconvolutional networks (DeconvNet) approach introduced in Zeiler and Fergus (2014). However, instead of reconstructing lower layer feature maps from higher layer activations as in DeconvNet, we merely try to identify important regions (supports) preserved in the forward activations in each layer from the backward (classspecific) information flow.
Suppose that we have already trained the network to an optimal state. In the forward pass, an image is presented to the network, and the activations in the entire network are computed. To explain the classification, we consider the class indicator vector
where for the predicted class and zero otherwise, and backpropagate this information to the input space. We use as the input of the backward pass to approximately “invert” each layer, while iteratively filtering these inverses using the forward activations. The process is repeated until the input layer is reached.Dense layers. For each fullyconnected layer (indexed by ), denote its activation as:
and the softmax activation is achieved at the final () layer. Our goal is to traverse each of these layers backwards. In order to achieve this, we define the “adjoint” of each operation^{1}^{1}1The term “adjoint” is only loosely defined here due to the nonlinearities involved.
as follows. The adjoint of the softmax layer,
, is defined pointwise such thatThe adjoint of the ReLU activation function is the ReLU itself. Overall, the “adjoint” of each fully connected layer is
.FowardBackward masking. The backward information flow is now filtered using the forward activations. Specifically, we only keep entries of such that their entrywise product with respective entries in are above some threshold parameter . The other entries are set to zero otherwise. This enables us to identify a candidate support corresponding to an interpretable feature in the input of a given layer.
Contributing feature maps. Among many backward feature maps at the top convolutional layer, we keep only of them in the backward information flow. The contribution of each map is determined as the total activation of the entire map. Hence, the features irrelevant to the probed class are removed.
Unpooling
. To perform an adjoint of the max pooling layer, we reshape the obtained pooled map
from the backward pass and replicate (copy) its values across the domain of the max operator. Then, we evaluate entrywise:This is similar to an analogous unpooling operation in DeconvNet; however, that approach only copies the value of in a single location via switches that are stored in memory. In contrast, our new scheme allows the backward feature maps after unpooling to be not overly sparse, and retains enough spatial information about the interpretation. We note that the replication step is suitable for any downsampling filter of size
and stride 2 (wherein the receptive fields are nonoverlapping). For other filter sizes and stride lengths, the replicated values are averaged over the overlapping locations.
Deconvolution. The deconvolution step is similar to DeconvNet, where we compute the adjoint by convolving the backward activation with the flipped filter weights of a corresponding filter.
As we traverse backwards through the network using the above operations, we successively retain a subset of pixel indices (or support) at the input of each layer that plausibly corresponds to the “interpretable” portion of the given image. In the end, we display the locations of these indices together with the values produced by the adjoint. The selectivity achieved by successive masking means that we always obtain a fairly sparse support in our final estimate; the sparsity can be controlled via appropriate choice of the threshold parameter .
3 Results
We visualize the interpretations provided by the proposed algorithm in Table 1
. We use VGG16 model pretrained with ImageNet dataset
Chollet et al. (2015). The visualizations are generated for the top 1 predictions of the image. The choice of top filters for computing the inverse is an important factor contributing to the interpretation obtained. If the value of is very less (around 10% of the filters), then the interpretations obtained looses a lot of important features and when is too high (close to 100%) then we see that the interpretations are too noisy. We have found that the visualizations obtained while using 100% of the filters, and no pointwise thresholding based on the forward function value, produces an output that is close to the DeconvNet algorithm. Thus, using the forward function and masking the inverse computed as explained above, we achieve better output than the DeconvNet as well as the guided backpropagation algorithms.For the experiments shown in the Table 1, we use the top 50% filters to propagate the inverse of each layer. We also use a thresholding value of 10.0 for the pointwise mask between the forward and backward values. We compare our method with the guided backpropagation algorithm. We notice that the resulting visualizations have lesser noise compared to that of the guided backpropagation algorithm.
4 Conclusions
In this work, we introduce a novel ForwardBackward approach for visualizing the interpretations that correspond to a particular class. However, we see that the choice of filters for computing the interpretations has become a hyperparameter to ensure that the interpretations are good. It is also seen that the filters which contribute to one particular class might have similar activations to the activations pertaining to some other class. Hence, decoupling the activations of these filters to choose which filters to use for computing the inverse, so that we maintain class discriminativeness, is something of much interest to the authors.
References
 Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
 Selvaraju et al. [2016] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Gradcam: Visual explanations from deep networks via gradientbased localization. See https://arxiv. org/abs/1610.02391 v3, 2016.
 Shrikumar et al. [2017] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685, 2017.
 Sundararajan et al. [2016] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Gradients of counterfactuals. arXiv preprint arXiv:1611.02639, 2016.

Zhou et al. [2016]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.
Learning deep features for discriminative localization.
InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 2921–2929, 2016.  Zeiler and Fergus [2014] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833. Springer, 2014.

Ribeiro et al. [2016]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.
Why should i trust you?: Explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016.  Chollet et al. [2015] François Chollet et al. Keras. https://github.com/fchollet/keras, 2015.
Comments
There are no comments yet.