In recent years, there has been a lot of focus on explaining deep neural networks[19, 15, 6, 3, 31]. Explainability is important for humans to trust the deep learning model, especially in crucial decision-making scenarios. In the computer vision domain, one of the most important explanation techniques is the heatmap approach [28, 23, 21, 29], which focuses on generating heatmaps that highlight parts of the input image that are most important to the decision of the deep networks on a particular classification target.
Some heatmap approaches achieve good visual qualities for human understanding, such as several one-step backpropagation-based visualizations including Guided Backpropagation (GBP) and the deconvolutional network (DeconvNet) . These approaches utilize the gradient or variants of the gradient and backpropagate them back to the input image, in order to decide which pixels are more relevant to the change of the deep network prediction. However, whether they are actually correlated to the decision-making of the network is not that clear .  proves that GBP and DeconvNet are essentially doing (partial) image recovery, and thus generate more human-interpretable visualizations that highlight object boundaries, which do not necessarily represent what the model has truly learned.
An issue with these one-step approaches is that they only reflect infinitesimal changes of the prediction of a deep network. In the highly nonlinear function estimated by the deep network, such infinitesimal changes are not necessarily reflective of changes large enough to alter the decision of the deep network.
proposed evaluation metrics based on masking the image with heatmaps and verifying whether the masking will indeed change deep network predictions. Ideally, if the highlighted regions for a category are removed from the image, the deep network should no longer predict that category. This is measured by thedeletion metric. On the other hand, the network should predict a category only using the regions highlighted by the heatmap, which is measured by the insertion metric (Fig. 1).
If these are the goals of a heatmap, a natural idea would be to directly optimize them. The mask approach proposed in  generates heatmaps by solving an optimization problem, which aims to find the smallest and smoothest area that maximally decrease the output of a neural network, directly optimizing the deletion metric. It can generate very good heatmaps, but usually takes a long time to converge, and sometimes the optimization can be stuck in a bad local optimum due to the strong nonconvexity of the solution space.
Another approach called integrated gradients  claim that any change in the output can be reflected in their heatmap. The basic idea is to explicitly find the image that has the lowest prediction score – a completely grey image, or a highly blurred image usually would not be predicted to any category by a deep network, and then integrate the gradients on the entire line between the grey/blurred image to the original image to generate a heatmap. However, the heatmaps generated by integrated gradients are normally diffuse, thus difficult for human to understand (Fig. 1).
In this paper, we propose a novel visualization approach I-GOS (Integrated-Gradients Optimized Saliency) which utilizes the integrated gradients to improve the mask optimization approach in . The idea is that the direction provided by the integrated gradients may lead better towards the global optimum than the normal gradient which may tend to lead to local optima. Hence, we replace the gradient in mask optimization with the integrated gradients. Due to the high cost of computing the integrated gradients, we employ a line-search based gradient-projection method to maximally utilize each computation of the integrated gradients. Our approach generates better heatmaps (Fig. 1) and utilizes less computational time than the original mask optimization, as line search is more efficient in finding appropriate step sizes, allowing significantly less iterations to be used. We highlight our contributions as follows:
We developed a novel heatmap visualization approach I-GOS, which optimizes a mask using the integrated gradients as descent steps.
Through regularization and perturbation we better avoided generating adversarial masks at higher resolutions, enabling more detailed heatmaps that are more correlated with the decision-making of the model.
Extensive evaluations show that the proposed approach performs better than the state-of-the-art approaches, especially in the insertion and deletion metrics.
2 Related Work
There are several different types of the visualization techniques for generating heatmaps for a deep network. We classify them into one-step backpropagation-based approaches [28, 23, 25, 22, 26, 2, 29, 21], and perturbation-based approaches, e.g., [30, 4, 8, 18].
The basic idea of one-step backpropagation-based visualizations is to backpropagate the output of a deep neural network back to the input space using the gradient or its variants. DeconvNet , Saliency Maps (using the gradient) , and GBP 
are similar approaches, with the difference among them in the way they deal with the ReLU layer. LRP and DeepLIFT  compute the contributions of each input feature to the prediction. Excitation BP  passes along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. GradCAM  uses the gradients of a target concept, flowing only into the final convolutional layer to produce a coarse localization map.  analyzes various backpropagation-based methods, and provides a unified view to explore the connections among them.
The perturbation-based methods firstly perturb parts of the input, and then run a forward pass to see which ones are most important to preserve the final decision. The earliest approach, , utilized a grey patch to occlude part of the image. This approach is direct but very slow, usually taking hours for a single image . An improvement is to introduce a mask, and solve for the optimal mask as an optimization problem [4, 8].  develop a trainable masking model that can produce the masks in a single forward pass. However, it is difficult to train a mask model, and different models need to be trained for different networks.  directly solves the optimization, and find the mask iteratively. Instead of only occluding one patch of the image, RISE  generates thousands of randomized input masks simultaneously, and averages them by their output scores. However, it consumes significant time and GPU memory.
Another seemingly related but different domain is the saliency map from human fixation . Fixation Prediction [12, 13] aims to identify the fixation points that human viewers would focus on at first glance of a given image. When predicting eye fixation, the algorithm is ”guessing” the regions humans are looking at, while our goal is to explain what the deep models focus on for a given image to make decisions. Deep models may use completely different mechanisms to classify than humans, hence human fixations should not be used to train or evaluate heatmap models.
3 Model Formulation
3.1 Gradient and Mask Optimization
Gradient and its variants are often utilized in visualization tools to demonstrate the importance of each dimension of the input. The motivation of it comes from the linearization of the model. Suppose a black-box deep network predicts a score on class. Assume is smooth at the current image , then a local approximation can be obtained using the first-order Taylor expansion:
The gradient is indicative of the local change that can be made to if a small perturbation is added to it, and hence can be visualized as an indication of salient image regions to provide a local explanation for image . In , the heatmap is computed by multiplying the gradient feature-wise with the input itself, i.e., , to improve the sharpness of heatmaps.
However, gradient only illustrates the infinitesimal change of the function at , which is not necessarily indicative of the salient regions that lead to a significant change on , especially when the function is highly nonlinear. What we would expect is that the heatmaps indicate the areas that would really change the classification result significantly. In , a perturbation based approach is proposed which introduces a mask as the heatmap to perturb the input . is optimized by solving the following objective function:
In (2), is a matrix which has the same shape as the input image and whose elements are all in ; is a baseline image with the same shape as , which should have a low score on the class , , and in practice either a constant image, random noise, or a highly blurred version of . This optimization seeks to find a deletion mask that significantly decreases the output score , i.e., under the regularization of . contains two regularization terms, with the first term on the magnitude of , and the second term a total-variation (TV) norm to make more piecewise-smooth.
Although this approach of optimizing a mask performs significantly better than the gradient method, there exist inevitable drawbacks when using a traditional first-order algorithm to solve the optimization. First, it is slow, usually taking hundreds of iterations to obtain the heatmap for each image. Second, since the model is highly nonlinear in most cases, optimizing (2) may only achieve a local optimum, with no guarantee that it indicates the right direction for a significant change related to the output class. Fig. 1 and Fig. 3 show some heatmaps generated by the mask approach.
3.2 Integrated Gradients
Note that the problem of finding the mask is not a conventional non-convex optimization problem. For , we (approximately) know the global minimum (or, at least a reasonably small value) of in a baseline image , which corresponds to . The integrated gradients approach  considers the straight-line path from the baseline to the input . Instead of evaluating the gradient at the provided input only, the integrated gradients would be obtained by accumulating all the gradients along the path:
where is the integrated gradients of at ; represents the -th pixel.  proved that it satisfies an axiom called completeness that the integrated gradients for all pixels add up to the difference between the output of at the input and the baseline , if is differentiable almost everywhere:
where the summation sums over all pixels in . The completeness axiom shows that if the baseline has a near-zero score, the integrated gradients can be interpreted as the prediction function of the input , which means all changes in are reflected in the integrated gradients.
In practice, the integral in (3) is approximated via a summation. We sum the gradients at points occurring at sufficiently small intervals along the straight-line path from the input to a baseline :
where is a constant, usually .
Integrated gradients have some nice theoretical properties and perform better than the gradient-based approaches. However, the heatmap generated by the integrated gradients is still diffuse. We speculate that the reason maybe that changes on some pixels in may not be very important to , or their contributions cancel out each other. Fig. 1 and Fig. 3 show some heatmaps generated by the integrated gradients approach where a grey zero image is utilized as the baseline. We can see that the integrated gradient contains many false positives in the area wherever the pixels have a large value of (either the white or the black pixels).
3.3 Integrated Gradients Optimized Heatmaps
We believe the above two approaches can be combined for a better heatmap approach. The integrated gradient naturally provides a better direction than the gradient in that it points more directly to the global optimum of a part of the objective function. One can view the convex constraint function as equivalent to the Lagrangian of a constrained optimization approach with constraints and , and being positive constants, hence consider the optimization problem (2) to be a constrained minimization problem on . In this case, we know the unconstrained solution in is outside the constraint region. We speculate that an optimization algorithm may be better than gradient descent if it directly attempts to move to the unconstrained optimum.
To illustrate this, Fig. 2 shows a 2D optimization with a starting point , a local optimum , and a baseline . The area within the black dashed line is the constraint region which is decided by the constraint function and the boundary of . A first-order algorithm will follow the gradient descent direction (the purple line) to the local optimum ; while the integrated gradients computed along the path from to the baseline may enable the optimization to reach an area better than within the constraint region. We can see that the integrated gradients with an appropriate baseline have a global view of the space and may generate a better descent direction. In practice, the baseline does not need to be the global optimum. A good baseline near the global optimum could still improve over the local optimum achieved by gradient descent.
Hence, we utilize the integrated gradients to substitute the gradient of the partial objective in optimization (2), and introduce a new visualization method called Integrated-Gradient Optimized Saliency (I-GOS). For the regularization terms in optimization (2), we still compute the partial (sub)gradient with respect to :
The total (sub)gradient of the optimization for at each step is the combination of the integrated gradients for the and the gradients of the regularization terms :
Note that this is no longer a conventional optimization problem, since it contains different types of gradients. The integrated gradients are utilized to indicate a direction for the partial objective ; the gradients of the are used to regularize this direction and prevent it to be diffuse.
3.4 Computing the step size
Since the time complexity of computing is high, we utilize a backtracking line search method and revise the Armijo condition  to help us compute the appropriate step size for the total gradient in formula (7). The Armijo condition tries to find a step size such that:
where is the descent direction; is the step size; is a parameter in ; is the gradient of at point .
The descent direction for our algorithm is set to be the inverse direction of the total gradient . However, since contains the integrated gradients, it is uncertain whether is negative or not. Hence, we replace with and obtain a revised Armijo condition as follows:
The detailed backtracking line search works as follows:
Initialization: set the values of the parameter , a decay , a upper bound and a lower bound for the step size; let , and ;
Output: if , the step size for equals to the lower bound ; else,
A projection step is needed in the iteration because the mask is bounded by the closed interval . Since we have an integrated gradient in , a large upper bound and a small are needed in order to obtain a large step that satisfies condition (9), similar to satisfying the Goldstein conditions for convergence in conventional Armijo-Goldstein line search.
Note that we cannot prove the convergence properties of the algorithm in non-convex optimization. However, the integrated gradient reduces to a scaling on the conventional gradient in a quadratic function (see supplementary material). In practice, it converges much faster than the original mask approach in  and we have never observed it diverging, although in some cases we do note that even with this approach the optimization stops at a local optimum. With the line search, we usually only run the iteration for steps. Intuitively, the irrelevant parts of the integrated gradients are controlled gradually by the regularization function and only the parts that truly correlate with output scores would remain in the final heatmap.
3.5 Avoiding adversarial examples
Since the mask optimization (2) is similar to the adversarial optimization [27, 9] except the TV term, it is concerning whether the solution would merely be an adversarial attack to the original image rather than explaining the relevant information. An adversarial example is essentially a mask that drives the image off the natural image manifold, hence the approach in  utilize a blurred version of the original image as the baseline to avoid creating strong adversarial gradients off the image manifold. We follow  and also use a blurred image as the baseline. The total variation constraints also defeats adversarial masks by making the mask piecewise-smooth. We also added other methods to avoid finding an adversarial perturbation:
When computing the integrated gradients using formula (5), we add different random noise to at each point along the straight-line path:
We set the resolution of the mask be smaller than the shape of the input image , upsample it before perturbing the input , and rewrite formula (2) as:
where up() upsamples to the original resolution.
4.1 Evaluation Metrics and Parameter Settings
Although many recent work focus on explainable machine learning, there is still no consensus about how to measure the explainability of a machine learning model. For the heatmaps, there exist several evaluation metrics, e.g., the pointing game, which measures the ability of a heatmap to focus on the ground truth object bounding box. However, such localization ability only represents human’s understanding about the objects in the images, instead of the deep model’s perspective of how to classify objects. There are plenty of evidences that deep learning sometimes uses background context for object classification which would invalidate pointing game evaluations. In , the authors also reveal that some backpropagation-based visualizations such as GBP and DeconvNet are essentially doing (partial) image recovery, and thus generate more human-interpretable visualizations that may score high on the pointing game but do not correlate with network results. Hence, the metrics from the pointing game may be not that convincing. Despite such deficiencies, we still treat it as a human-understandable metric to evaluate the performance of our approach. Following , if the most salient pixel lies inside the human annotated bounding box of an object, it is counted as a hit. The pointing game accuracy equals to , averaged over all categories.
We follow  to adopt deletion and insertion as better metrics to evaluate the performance of the heatmaps generated by different approaches. The intuition behind the deletion metric is that the removal of the pixels most relevant to a class will cause the original class score dropping sharply. Only utilizing the deletion metric is not satisfactory enough since adversarial images can also achieve a quite good performance. Thus, the insertion metric is also needed as a supplementary. The intuition behind the insertion metric is that only keeping the most relevant pixels will retain the original score as much as possible, which can eliminate the disturbing from the adversarial attacks. The insertion metric would not score adversarial examples highly (see examples in supplementary material), since to achieve a good insertion score, the deep model needs to make a confident, consistent prediction using a small part of the image. In general, the deletion-insertion metrics is a fair metric pair to evaluate different visualization approaches.
In detail, for the deletion metric, we remove pixels (dependent on the resolution of the mask) each time from the original image based on the values of the heatmap until no pixel is left, and replace the removed ones with the pixels of the same locations from a highly blurred version of the original image. The deletion score is the area under the curve (AUC) of the classification scores after softmax ). For the insertion metric, we do it inversely, which means we replace pixels from the highly blurred image with the ones from the original image, based on the values of the heatmap until no pixel left. The insertion score is also the AUC of the classification scores for all the replaced images. In the experiments, we generate heatmaps with different resolutions, including , , , , and . And we compute the deletion and insertion scores based on heatmaps with the original resolutions before upsampling.
, are tested as the base models. For the deletion and insertion task, we utilize the pretrained VGG19 and Resnet50 networks from the PyTorch model zoo to testrandomly selected images from the validation set of ImageNet. For the pointing game, we utilize two pretrained VGG16 models from  to test randomly selected images from the validation set of MSCOCO, and randomly selected images from the test set of VOC07, respectively. In Eq. 9, . and (Eq. 10) were fixed across all experiments under the same heatmap resolution. We downloaded and ran the code for most baselines, except for  which we implemented. All baselines were tuned to best performances. For all experiments we used the same pre-/post-processing with the same baseline image . In , the authors used a less blurred image for insertion and a grey image for deletion. Here we used a more blurred image for both insertion and deletion, hence the insertion and deletion scores for RISE are bit different in our paper compared with theirs.
4.2 Results and Discussions
Deletion and Insertion: Table 1 and 2 show the comparative evaluations of I-GOS with other state-of-the-art approaches in terms of both deletion and insertion metrics on the ImageNet dataset using VGG19 and ResNet50 as the baseline model, respectively. Fig. 3 shows some comparison examples between different approaches on heatmaps. From Table 1 and 2 we observe that our proposed approach I-GOS performs better than Excitation BP and Mask  in both deletion and insertion scores for heatmaps with all different resolutions. RISE and Integrated Gradients can only generate heatmaps. GradCam can only generate heatmap on VGG19, and heatmap on Resnet50, respectively. And our approach also beats RISE, Integrated Gradient, and GradCam in both deletion and insertion scores on heatmaps with the same resolutions. Although Integrated Gradients has some good properties theoretically, it gets the worst insertion score among all the approaches, which indicates that it indeed contains lots of diffused pixels uncorrelated with the classification, as in the Cucumber and Oboe examples in Fig. 3. Excitation BP is a one-step backpropagation-based approach that is better than other one-step backpropagation-based approaches, and during the experiments we find that sometimes it just fires on the border and corner of the image instead of the contents, or on irrelevant parts of the image as argued in . Thus, it performs the worst in the deletion task. RISE also suffers on the deletion score maybe because of the randomness on the masks it generates. Fig. 5 shows some visual comparisons between our approach and Excitation BP. Our approach performs better than GradCam for VGG19, and only slightly better for Resnet50. The reason is that for the heatmap in Resnet50, it is difficult to increase the insertion score or decrease the deletion score further since there are only pixels in the heatmap. For RISE, we followed  to generate random samples for VGG, and random samples for ResNet. Fig. 4 shows some visual comparisons between our approach, GradCAM, and RISE. From Fig. 4 we observe that sometimes GradCAM also fires on image border, corner, or irrelevant parts of the image (cocker spaniel in Fig. 4), which results in bad deletion and insertion scores. And the randomness on the mask indeed limits the performance of RISE (Jellyfish in Fig. 4).
Ablation Studies: We show the results of ablation studies in Table 3. From Table 3 we observe that without the TV term, insertion scores would indeed suffer significantly while deletion scores won’t change much, indicating that the TV term is important to avoid adversarial masks. The random noise introduced in Sec 3.5 is very useful when the resolution of the mask is high (e.g, 224224). From Fig. 6 we observe that I-GOS with noise can achieve much better insertion score than without noise for the same insertion ratio. When the resolution is low (e.g, 2828), the noise is not that important since low resolution can already avoid adversarial examples. When we utilize a fixed step size (the step size is in Table 3), both deletion and insertion scores become worse, showing the utility of the line search.
Running Time: In Table 4, we summarize the average running time for Mask, RISE, GradCam, Integrated Gradients, and I-GOS on ImageNet dataset using ResNet50 as the base model. For each approach, we only use one Nvidia 1080Ti GPU. For I-GOS, the maximal iteration is ; for Mask, the maximal iteration is ; for RISE, the number of random input samples is . Combined with Table 2 and Table 4, we observe that our approach utilizes less time to generate better results than Mask and RISE. Especially, not a lot of iterations need to be used. For I-GOS, the number of iterations to converge is and the time for each iteration is . The average running times for the backpropagation-based methods are all less than second. However, our approach achieve much better performance than these approaches, especially with higher resolutions. To the best of our knowledge, our approach I-GOS is the fastest among the perturbation-based methods, as well as the one with the best performance in deletion and insertion metrics among all heatmap approaches.
Failure Case: Fig. 7 shows one failure case, where the insertion score did not increase till the end. Our observation is that optimization-based methods such as I-GOS usually do not work well when the deep model’s prediction confidence is very low (less than ), or when the deep model makes a wrong prediction.
Pointing Game: Table 5 shows the comparative evaluations of I-GOS with other state-of-the-art approaches in terms of mean accuracy in the pointing game on MSCOCO and PASCAL, respectively. Here we utilize the same pretrained models from . Hence, we list the pointing game accuracies reported in the paper except for Mask and our approach I-GOS. From Table 5 we observe that, I-GOS beats all the other compared approaches except for RISE, and it improves significantly over of the Mask. During the experiments we notice that, some object labels for MSCOCO and PASCAL in the pointing game have very small output scores for the pretrained VGG16 models, which affects the optimization greatly for both Mask and I-GOS. RISE does not seem to suffer from this. We believe RISE may be good at the pointing game, but its randomness would generally lead to a mask that is too diffuse, which significantly hurts its deletion and insertion scores (Table 1 and 2), while our approach generates a much more concise heatmap.
In this paper, we propose a novel visualization approach I-GOS, which utilizes integrated gradients to optimize for a heatmap. We show that the integrated gradients provides a better direction than the gradient when a good baseline is known for part of the objective of the optimization. The heatmaps generated by the proposed approach are human-understandable and more correlated to the decision-making of the model. Extensive experiments are conducted on three benchmark datasets with four pretrained deep neural networks, which shows that I-GOS advances state-of-the-art deletion and insertion scores on all heatmap resolutions.
-  M. Ancona, E. Ceolini, A. C. Öztireli, and M. H. Gross. A unified view of gradient-based attribution methods for deep neural networks. CoRR, abs/1711.06104, 2017.
-  S. Bach, A. Binder, G. Montavon, F. Klauschen, K. Müller, and W. Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One, 10, 7 2015.
D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba.
Network dissection: Quantifying interpretability of deep visual
Computer Vision and Pattern Recognition, 2017.
-  P. Dabkowski and Y. Gal. Real time image saliency for black box classifiers. In NIPS, 2017.
-  Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
-  E. Elenberg, A. G. Dimakis, M. Feldman, and A. Karbasi. Streaming weak submodularity: Interpreting neural networks on the fly. In Advances in Neural Information Processing Systems, pages 4044–4054, 2017.
-  M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
-  R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3449–3457, Oct 2017.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
-  S. Johnson and T. Subha. A study on eye fixation prediction and salient object detection in supervised saliency. Materials Today: Proceedings, 4(2, Part B):4169 – 4181, 2017. International Conference on Computing, Communication, Nanophotonics, Nanoscience, Nanomaterials and Nanotechnology.
-  S. S. S. Kruthiventi, V. Gudisa, J. H. Dholakiya, and R. Venkatesh Babu. Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
-  M. Kummerer, T. S. A. Wallis, L. A. Gatys, and M. Bethge. Understanding low- and high-level contributions to fixation prediction. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
-  T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
-  S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774, 2017.
-  W. Nie, Y. Zhang, and A. Patel. A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. ArXiv e-prints, May 2018.
-  J. Nocedal and S. Wright. Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, 2000.
-  V. Petsiuk, A. Das, and K. Saenko. RISE: Randomized Input Sampling for Explanation of Black-box Models. ArXiv e-prints, June 2018.
-  M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016.
-  O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015.
-  R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
-  A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje. Not just a black box: Learning important features through propagating activation differences. CoRR, abs/1605.01713, 2016.
-  K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. ICLR Workshop, 2014.
-  K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations, 2015.
-  J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In ICLR Workshop, 2015.
-  M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, pages 3319–3328. PMLR, 2017.
-  C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
-  M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, pages 818–833, Cham, 2014. Springer International Publishing.
-  J. Zhang, Z. Lin, J. Brandt, X. Shen, and S. Sclaroff. Top-down neural attention by excitation backprop. In European Conference on Computer Vision, pages 543–559. Springer, 2016.
-  B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. CoRR, abs/1412.6856, 2014.
-  B. Zhou, Y. Sun, D. Bau, and A. Torralba. Interpretable basis decomposition for visual explanation. In The European Conference on Computer Vision (ECCV), September 2018.
I. Properties of the Integrated Gradient in Quadratic Functions
The integrated gradients reduce to a scaling on the conventional gradient in a quadratic function if the baseline is the optimum.
Given a quadratic function , we have its conventional gradient as: . Considering a straight-line path from the current point to the baseline , for point along the path, we have: ,
Thus, we obtain the integrated gradient along the straight-line path as:
When the baseline is the optimum of the quadratic function, , and then
Hence, the integrated gradients reduce to a scaling on the conventional gradient.
In this case, the revised Armijo condition also reduces to the conventional Armijo condition up to a constant.
II. Adversarial Examples
Figure 8-9 shows some examples when using I-GOS to visualize adversarial examples. Here we utilize the MI-FGSM method  on VGG19 to generate adversarial examples. From Fig. 8-9 we observe that the heatmaps for the original images and for the adversarial examples generated by I-GOS are totally different. For the original image, I-GOS can often lead to a high classification confidence on the original class by inserting a small portion of the pixels. For the adversarial image though, almost the entire image needs to be inserted for CNN to predict the adversarial category. We note that we are not presenting I-GOS as a defense against adversarial attacks, and that specific attacks may be designed targeting the salient regions in the image. However, these figures show that the I-GOS heatmap and the insertion metric are robust against those full-image based attacks and not performing mere image reconstruction.
III. Deletion and Insertion Visualizations
Fig. 10 shows more comparison examples between different approaches on heatmaps. Fig. 11 shows more visual comparisons between our approach, GradCAM, and RISE. From Fig. 10 we can see that, for Mask, it focuses on person instead of Yawl on the left image, and focuses on grass instead of Impala on the right image, indicating that sometimes the optimization can be stuck in a bad local optimum. From Fig. 11 we observe that sometimes GradCAM also fires on image border, corner, or irrelevant parts of the image (Grey whale in Fig. 11), which results in bad deletion and insertion scores. And the randomness on the mask indeed limits the performance of RISE (West Highland white terrier in Fig. 11).
Fig. 12-13 show some examples generated by our approach I-GOS in the deletion and insertion task using VGG19 as the baseline model. Fig. 14-15 show some examples generated by I-GOS in the deletion and insertion task using Resnet50 as the baseline model. The deletion or insertion image is generated by , where the resolution of is . For deletion image, we initialize the mask as matrix of ones, then set the top pixels in the mask to based on the values of the heatmap, where the deletion ratio represents the proportion of pixels that are set to . For insertion image, we initialize mask as matrix of zeros, then set the top pixels in the mask to based on the values of the heatmap, where the insertion ratio represents the proportion of pixels that are set to . In Fig. 12-15, the masked/revealed regions of the images may seem a little larger than the number of the deletion/insertion ratios. The reason is that after upsampling the mask , some pixels on the border may have values between and , resulting in larger regions to be masked or revealed. The predicted class probability is the output value after softmax for the same category using the original image, the deletion image, and the insertion image as the input, respectively. From Fig. 12-15 we observe that the proposed approach I-GOS can utilize a low deletion ratio to achieve a low predicted class probability for the deletion task, and a low insertion ratio to achieve a high predicted class probability for the insertion task at the same time, indicating that I-GOS truly discovers the key features of the images that the CNN network is using. Especially, we realize that CNN is indeed fixating on very small regions in the image and very local features in many cases to make a prediction, e.g. in Pomeranian, the face of the dog is utmostly important. Without the face the prediction is reduced to almost zero, and with only the face and a rough outline of the dog, the prediction is almost perfect. The same can be said for Eft, Black grouse, lighthouse and boxer. Interestingly, for Container ship and trailer truck, their functional parts are extremely important to the classification. Trailer truck almost cannot be classified without the wheels (and could be classified with only the wheels), and container ship cannot be classified without the containers (and could be classified with almost only the containers and a rough outline of the ship).