Convolutional Neural Networks (CNNs) have proven to produce state-of-the-art results on a multitude of vision benchmarks, such as ImageNet, Caltech  or Cityscapes  which led to CNNs being used in numerous real-world systems (e.g. autonomous vehicles) and services (e.g. translation services). Though, the use of CNNs in safety-critical domains presents engineers with challenges resulting from their black-box character. A better understanding of the inner workings of a model provides hints for improving it, understanding failure cases and it may reveal shortcomings of the training data. Additionally, users generally trust a model more when they understand its decision process and are able to anticipate or verify outputs .
To overcome the interpretation and transparency disadvantage of black-box models, post-hoc explanation methods have been introduced [53, 35, 42, 49, 32, 17, 11]. These methods provide explanations for individual predictions and thus help to understand on which evidence a model bases its decisions. The most common form of explanations are visual, image-like representations, which depict the important pixels or image regions in a human interpretable manner.
In general, an explanation should be easily interpretable (Sec. 4.1). Additionally, a visual explanation should be class discriminative and fine-grained  (Sec. 4.2). The latter property is particularly important for classification tasks in the medical [20, 18] domain, where fine structures (e.g. capillary hemorrhages) have a major influence on the classification result (Sec. 5.2). Besides, the importance of different color channels should be captured, e.g. to uncover a color bias in the training data (Sec. 4.3).
Moreover, explanations should be faithful, meaning they accurately explain the function of the black-box model . To evaluate the faithfulness (Sec. 5.1), recent work [35, 32, 7] introduce metrics which are based on model predictions of explanations. To be able to compute such metrics without having to rely on proxy measures , it is beneficial to employ explanation methods which directly generate valid model inputs (e.g. a perturbed version of the image).
A major concern of optimization based visual explanation methods is adversarial evidence, i.e. faulty evidence generated by artefacts introduced in the computation of the explanation. Therefore, additional constraints or regularizations are used to prevent such faulty evidence [17, 11, 14]
. A drawback of these defenses are added hyperparameters and the necessity of either a reduced resolution of the explanation or a smoothed explanation (Sec.3.2), thus, they are not well suited for displaying fine-grained evidence.
Our main contribution is a new adversarial defense technique which selectively filters gradients in the optimization which would lead to adversarial evidence otherwise (Sec. 3.2). Using this defense, we extend the work of  and propose a new fine-grained visual explanation method (FGVis). The proposed defense is not dependend on hyperparameters and is the key to produce fine-grained explanations (Fig. 1) as no smoothing or regularizations are necessary. Like other optimization-based approaches, FGVis computes a perturbed version of the original image, in which either all irrelevant or the most relevant pixels are removed. The resulting explanations (Fig 1 b) are valid model inputs and their faithfulness can, thus, be directly verified (as in methods from [17, 14, 6, 11]). Moreover, they are additionally fine-grained (as in methods from [35, 38, 48, 42]). To the best of our knowledge, this is the first method to be able to produce fine-grained explanations directly in the image space. We evaluate our defense (Sec. 3.2) and FGVis (Sec. 4 and 5) qualitatively and quantitatively.
2 Related Work
Various methods to create explanations have been introduced. Thang et al.  and DU et al.  provide a survey of these. In this section, we give an overview of explanation methods which generate visual, image-like explanations.
Backpropagation Based Methods (BBM).
These methods generate an importance measure for each pixel by backpropagating an error signal to the image. Simonyanet al. , which build on work of Baehrens et al. , use the derivative of a class score with respect to the image as an importance measure. Similar methods have been introduced in Zeiler et al.  and Springenberg et al. 
, which additionally manipulate the gradient when backpropagating through ReLU nonlinearities. Integrated Gradients additionally accumulates gradients along a path from a base image to the input image. SmoothGrad  and VarGrad  visually sharpen explanations by combining multiple explanations of noisy copies of the image. Other BBMs such as Layer-wise Relevance Propagation , DeepLift  or Excitation Backprop  utilize top-down relevancy propagation rules. BBMs are usually fast to compute and produce fine-grained importance/relevancy maps. However, these maps are generally of low quality [11, 14] and are less interpretable. To verify their faithfulness it is necessary to apply proxy measures or use pre-processing steps, which may falsify the result.
Activation Based Methods (ABM). These approaches use a linear combination of activations from convolutional layers to form an explanation. Prominent methods of this category are CAM (Class Activation Mapping)  and its generalizations Grad-CAM  and Grad-CAM++ . These methods mainly differ in how they calculate the weights of the linear combination and what restrictions they impose on the CNN. Extensions of such approaches have been proposed in Selvaraju et al.  and Du et al. , which combine ABMs with backpropagation or perturbation based approaches. ABMs generate easy to interpret heat-maps which can be overlaid on the image. However, they are generally not well suited to visualize fine-grained evidence or color dependencies. Additionally, it is not guaranteed that the resulting explanations are faithful and reflect the decision making process of the model [14, 35].
Perturbation Based Methods (PBM). Such approaches perturb the input and monitor the prediction of the model. Zeiler et al. 
slide a grey square over the image and use the change in class probability as a measure of importance. Several approaches are based on this idea, but use other importance measures or occlusion strategies. Petsiuket al.  use randomly sampled occlusion masks and define importance based on the expected model score over masks. LIME  uses a super-pixel based occlusion strategy and a surrogate model to compute importance scores. Further super-pixel or segment based methods are introduced in Seo et al.  and Zhou et al. . The so far mentioned approaches do not need access to the internal state or structure of the model. Though, they are often quite time consuming and only generate coarse explanations.
Other PBMs generate an explanation by optimizing for a perturbed version of the image [11, 17, 14, 6]. The perturbed image is defined by , where is a mask, the input image, and a reference image containing little information (Sec. 3.1). To avoid adversarial evidence, these approaches need additional regularizations , constrain the explanation (e.g. optimize for a coarse mask [6, 17, 14]), introduce stochasticity , or utilize regularizing surrogate models . These approaches generate easy to interpret explanations in the image space, which are valid model inputs and faithful (i.e. a faithfulness measure is incorporated in the optimization).
Our method also optimizes for a perturbed version of the input. Compared to existing approaches we propose a new adversarial defense technique which filters gradients during optimization. This defense does not need hyperparameters which have to be fine-tuned. Besides, we optimize each pixel individually, thus, the resulting explanations have no limitations on the resolution and are fine-grained.
3 Explaining Model Predictions
Explanations provide insights into the decision-making process of a model. The most universal form of explanations are global ones which characterize the overall model behavior. Global
explanations specify for all possible model inputs the corresponding output in an intuitive manner. A decision boundary plot of a classifier in a low-dimensional vector space, for example, represents aglobal
explanation. For high-dimensional data and complex models, it is practically impossible to generate such explanations. Current approaches therefore utilizelocal explanations111For the sake of brevity, we will use the term explanations as a synonym for local explanations throughout this work., which focus on individual inputs. Given one data point, these methods highlight the evidence on which a model bases its decisions. As outlined in Sec. 2, the definition of highlighting depends on the used explanation method. In this work, we follow the paradigm introduced in  and directly optimize for a perturbed version of the input image. Such an approach has several advantages: 1) The resulting explanations are interpretable due to their image-like nature; 2) Explanations represent valid model inputs and are thus testable; 3) Explanations are optimized to be faithful. In Sec. 3.1 we briefly review the general paradigm of optimization based explanation methods before we introduce our novel adversarial defense technique in Sec. 3.2.
3.1 Perturbation based Visual Explanations
Following the paradigm of optimization based explanation methods, which compute a perturbed version of the image [17, 14, 6, 11], an explanation can be defined as:
Explanation by Preservation: The smallest region of the image which must be retained to preserve the original model output (i.e. minimal sufficient evidence).
Explanation by Deletion: The smallest region of the image which must be deleted to change the model output.
To formally derive an explanation method based on this paradigm, we assume that a CNN is given which maps an input image to an output . The ouput is a vector representing the softmax scores of the different classes . Given an input image , an explanation of a target class (e.g. the most-likely class ) is computed by removing either relevant (deletion) or irrelevant, not supporting , information (preservation) from the image. Since it is not possible to remove information without replacing it, and we do not have access to the image generating process, we have to use an approximate removal operator . A common approach is to use a mask based operator , which computes a weighted average between the image and a reference image , using a mask :
Common choices for the reference image are constant values (e.g. zero), a blurred version of the original image, Gaussian noise, or sampled references of a generative model [17, 14, 6, 11]. In this work, we take a zero image as reference. In our opinion, this reference produces the most pleasing visual explanations, since irrelevant image areas are set to zero222Tensors are assumed to be normalized according to the training of the CNN. A value of zero for these thus corresponds to a grey color (i.e. the color of the data mean). (Fig. 1) and not replaced by other structures. In addition, the zero image (and random image) carry comparatively little information and lead to a model prediction with a high entropy. Other references, such as a blurred version of the image, usually result in lower prediction entropies, as shown in Sec. A3.1. Due to the additional computational effort, we have not considered model-based references as proposed in Chang et al. .
In addition, a similarity metric is needed, which measures the consistency of the model output generated by the explanation and the output of the image with respect to a target class . This similarity metric should be small if the explanation preserves the output of the target class and large if the explanation manages to significantly drop the probability of the target class . Typical choices for the metric are the cross-entropy with the class as a hard target  or the negative softmax score of the target class . The similarity metric ensures that the explanation remains faithful to the model and thus accurately explains the function of the model, this property is a major advantage of PBMs.
Using the mask based definition of an explanation with a zero image as reference () as well as the similarity metric, a preserving explanation can be computed by:
We will refer to the optimization in Eq. 2 as the preservation game. Masks (Fig. 2 / b2)333Fig. 2 / b2: Figure 2, column b, 2nd row generated by this game are sparse (i.e. many pixels are zero / appear black; enforced by minimizing ) and only contain large values at most important pixels. The corresponding explanation is computed by multiplying the mask with the image (Fig. 2 / c2).
Alternatively, we can compute a deleting explanation using:
This optimization will be called deletion game henceforward. Masks (Fig. 2 / b1) generated by this game contain mainly ones (i.e. appear white; enforced by maximizing in Eq. 3) and only small entries at pixels, which provide the most prominent evidence for the target class. The colors in a mask of the deletion game are complementary to the image colors. To obtain a true-color representation analogous to the preservation game, one can alternatively visualize the complementary mask (Fig. 2 / d1): . A resulting explanation of the deletion game, as defined in Eq. 3, is visualized in Fig. 2 / c1. This explanation is visually very similar to the original image as only a few pixels need to be deleted to change the model output. In the remaining of the paper for better visualization, we depict a modified version of the explanation for the deletion game: . This explanation has the same properties as the one of the preservation game, i.e. it only highlights the important evidence. We observe that the deletion game generally produces sparser explanations compared to the preservation game, as less pixels have to be removed to delete evidence for a class than to maintain evidence by preserving pixels.
, we utilize Stochastic Gradient Descent and start with an explanationidentical to the original image (i.e. a mask initialized with ones). As an alternative initialization of the masks, we additionally explore a zero initialization . In this setting the initial explanation contains no evidence towards any class and the optimization iteratively has to add relevant (generation game) or irrelevant, not supporting the class , information (repression game). The visualizations of the generation game are equivalent to those of the preservation game, the same holds for the deletion and repression game. In our experiments the deletion game produces the most fine-grained and visually pleasing explanations. Compared to the other games it usually needs the least amount of optimization iterations since we start with and comparatively few mask values have to be changed to delete the evidence for the target class. A comparison and additional characteristics of the four optimization settings (i.e. games) are included in Sec. A3.5.
3.2 Defending against Adversarial Evidence
CNNs have been proven susceptible to adversarial images [45, 19, 27], i.e. a perturbed version of a correctly classified image crafted to fool a CNN. Due to the computational similarity of adversarial methods and optimization based visual explanation approaches, adversarial noise is also a concern for the latter methods and one has to ensure that an explanation is based on true evidence present in the image and not on false adversarial evidence introduced during optimization. This is particularly true for the generation/repression game as their optimization start with and iteratively adds information.
 and  showed the vulnerability of optimization based explanation methods to adversarial noise. To avoid adversarial evidence, explanation methods use stochastic operations , additional regularizations [17, 11], optimize on a low-resolution mask with upsampling of the computed mask [17, 14, 6], or utilize a regularizing surrogate model . In general, these operations impede the generation of adversarial noise by obscuring the gradient direction in which the model is susceptible to false evidence, or by constraining the search space for potential adversarials. These techniques help to reduce adversarial evidence, but also introduce new drawbacks: 1) Defense capabilities usually depend on human-tuned parameters; 2) Explanations are limited to being low resolution and/or smooth, which prevents fine-grained evidence from being visualized.
A novel Adversarial Defense.
To overcome these drawbacks, we propose a novel adversarial defense which filters gradients during backpropagation in a targeted way. The basic idea of our approach is: A neuron within a CNN is only allowed to be activated by the explanationif the same neuron was also activated by the original image . If we regard neurons as indicators for the existence of features (e.g. edges, object parts, …), the proposed constraint enforces that the explanation can only contain features which exist at the same location in the original image . By ensuring that the allowed features in are a subset of the features in it prevents the generation of new evidence.
This defense technique can be integrated in the introduced explanation methods via an optimization constraint:
where is the activation of the -th neuron in the -th layer of the network after the nonlinearity. For brevity, the index references one specific feature at one spatial position in the activation map. This constraint is applied after all nonlinearity-layers (e.g. ReLU-Layers) of the network, besides the final classification layer. It ensures that the absolute value of activations can only be reduced towards values representing lower information content (we assume that zero activations have the lowest information as commonly applied in network pruning ). To solve the optimization with subject to Eq. 4, one could incorporate the constraints via a penalty function in the optimization loss. The drawback is one additional hyperparameter. Alternatively, one could add an additional layer after each nonlinearity which ensures the validity of Eq. 4:
where is the actual activation of the original nonlinearity-layer and the adjusted activation after ensuring the bounds , of the original input. For instance, for a ReLU nonlinearity, the upper bound is equal to and the lower bound is zero. We are not applying this method as it changes the architecture of the model which we try to explain. Instead, we clip gradients in the backward pass of the optimization, which lead to a violation of Eq. 4. This is equivalent to adding an additional clipping-layer after each nonlinearity which acts as the identity in the forward pass and uses the gradient update of Eq. 5 in the backward pass. When backpropagating an error-signal through the clipping-layer, the gradient update rule for the resulting error is defined by:
where is the indicator function and , the bounds computed in Eq. 5. This clipping only affects the gradients of the similarity metric
which are propagated through the network. The proposed gradient clipping does not add hyperparameters and keeps the original structure of the model during the forward pass. Compared to other adversarial defense techniques (, , ), it imposes no constraint on the explanation (e.g. resolution/smoothness constraints), enabling fine-grained explanations.
Validating the Adversarial Defense. To evaluate the performance of our defense, we compute an explanation for a class for which there is no evidence in the image (i.e. it is visually not present). We approximate with the least-likely class considering only images which yield very high predictive confidence for the true class . Using as the target class, the resulting explanation method without defense is similar to an adversarial attack (the Iterative Least-Likely Class Method ).
A correct explanation for the adversarial class should be “empty” (i.e. grey), as seen in Fig. 3 b, top row, when using our adversarial defense. If, on the other hand, the explanation method is susceptible to adversarial noise, the optimization procedure should be able to perfectly generate an explanation for any class. This behavior can be seen in Fig. 3 c, top row. The shown explanation for the adversarial class (: limousine) contains primarily artificial structures and is classified with a probability of as limousine.
We also depict the explanation of the predicted class (: agama). The explanation with our defense results in a meaningful representation of the agama (Fig. 3 b, bottom row); without defense (Fig. 3 c / d, bottom row) it is much more sparse. As there is no constraint to change pixel values arbitrarily, we assume the algorithm introduces additional structures to produce a sparse explanation.
A quantitative evaluation of the proposed defense is reported in Tab. 1. We generate explanations for 1000 random ImageNet validation images and use a class as the explanation target444For we used the least-likely class, as described before. We use the second least-likely class, if the least-likely class coincidentally matches the predicted class for the zero image.. To ease the generation of adversarial examples, we set the sparsity loss to zero and only use the similarity metric which tries to maximize the probability of the target class . Without an employed defense technique, the optimization is able to generate an adversarial explanation for 100% of the images. Applying our defense (Eq. 6), the optimization nearly never was able to do so. The two adverarial examples generated in VGG16 have a low confidence, so we assume that there has been some evidence for the chosen class in the image. Our proposed technique is thus well suited to defend against adversarial evidence.
4 Qualitative Results
Implementation details are stated in Sec. A2.
Comparison of methods. Using the deletion game we compute mean explanation masks for GoogleNet and compare these in Fig. 5 with state-of-the-art methods. Our method delivers the most fine-grained explanation by deleting important pixels of the target object. Especially explanations b), f), and g) are coarser and, therefore, tend to include background information not necessary to be deleted to change the original prediction. The majority of pixels highlighted by FGVis form edges of the object. This cannot be seen in other methods. The explanations from c) and d) are most similar to ours. However, our masks are computed to directly produce explanations which are viable network inputs and are, therefore, verifiable — The deletion of the highlighted pixels prevents the model from correctly predicting the object. This statement does not necessarily hold for explanations calculated with methods c) and d).
Architectural insights. As first noted in  explanations using backpropagation based approaches show a grid-like pattern for ResNet. In general,  demonstrate that the network structure influences the visualization and assume that for ResNet the skip connections play an important role in their explanation behavior. As shown in Fig 6 this pattern is also visible in our explanations to an even finer degree. Interestingly, the grid pattern is also visible to a lesser extent outside the object. A detailed investigation of this phenomenon is left for future research. See A3.4 for a comparison of explanations between models.
4.2 Class Discriminative / Fine-Grained
Visual explanation methods should be able to produce class discriminative (i.e. focus on one object) and fine-grained explanations . To test FGVis with respect to these properties, we generate explanations for images containing two objects. The objects are chosen from highly different categories to ensure little overlapping evidence. In Fig. 4, we visualize explanations of three such images, computed using the deletion game and GoogleNet. Additional results can be found in Sec. A3.2.
FGVis is able to generate class discriminative explanations and only highlights pixels of the chosen target class. Even partially overlapping objects, as the elkhound and ball in Fig. 4, first row, or the bridge and schooner in Fig. 4, third row, are correctly discriminated. One major advantage of FGVis is its ability to visualize fine-grained details. This property is especially visible in Fig 4, second row, which shows an explanation for the target class fence. Despite the fine structure of the fence, FGVis is able to compute a precise explanation which mainly contains fence pixels.
4.3 Investigating Biases of Training Data
An application of explanation methods is to identify a bias in the training data. Especially for safety-critical, high-risk domains (e.g. autonomous driving), such a bias can lead to failures if the model does not generalize to the real world.
Learned objects. One common bias is the coexistence of objects in images which can be depicted using FGVis. In Sec. A3.3, we describe such a bias in ImageNet for sports equipment appearing in combination with players.
Learned color. Objects are often biased towards specific colors. FGVis can give a first visual indication for the importance of different color channels. We investigate if a VGG16 model trained on ImageNet shows such a bias using the preservation game. We focus on images of school buses and minivans and compare explanations (Fig. 7; all correctly predicted images in Fig. A6 and A8). Explanations of minivans focus on edges, not consistently preserving the color compared to school buses with yellow dominating those explanations. This is a first indication for the importance of color for the prediction of school buses.
To verify the qualitative finding, we quantitatively give an estimation of the color bias. As an evaluation we swap each of the three color channelsBGR to either RBG or GRB and calculate the ratio of maintained true classifications on the validation data after the swap. For minivans (averaged over RBG and GRB) of the correctly classified images keep their class label, for school buses it is only of images. For ImageNet classes at least of images are no longer truly classified after the color swap. We show the results for the most and least affected classes and minivan / school bus in Tab. A3.
To the best of our knowledge, FGVis is the first method used to highlight color channel importance.
5 Quantitative Results
5.1 Faithfulness of Explanations
The faithfulness of generated visual explanations to the underlying neural network is an important property of explanation methods . To quantitatively compare the faithfulness of methods, Petsiuk et al.  proposed causal metrics which do not depend on human labels. These metrics are not biased towards human perception and are thus well suited to verify if an explanation correctly represents the evidence on which a model bases its prediction.
We use the deletion metric  to evaluate the faithfulnes of explanations generated by our method. This metric measures how the removal of evidence effects the prediction of the used model. The metric assumes that an importance map is given, which ranks all image pixels with respect to their evidence for the predicted class . By iteratively removing important pixels from the input image and measuring the resulting probability of the class a deletion curve can be generated, whose area under the curve AUC is used as a measure of faithfulness (Sec. A4.1).
In Tab. 2, we report the deletion metric of FGVis, computed on the validation split of ImageNet using different models. We use the deletion game to generate masks , which determine the importance of each pixel. A detailed description of the experiment settings as well as additional figures, can be found in Sec. A4.1. FGVis outperforms the other explanation methods on both models by a large margin. This performance increase can be attributed to the ability of FGVis to visualize fine-grained evidence. All other approaches are limited to coarse explanations, either due to computational constraints or due to the used measures to avoid adversarial evidence. The difference between the two model architectures can most likely be attributed to the superior performance of ResNet50, resulting in on average higher softmax scores over all validation images.
5.2 Visual explanation for medical images
We evaluate FGVis on a real-world use case to identify regions in eye fundus images which lead a CNN to classify the image as being affected with referable diabetic retinopathy (RDR). Using the deletion game we derive a weakly-supervised approach to detect RDR lesions. The setup, used network, as well as details on the disease and training data are described in A4.2. To evaluate FGVis, the DiaretDB1 dataset  is used containing 89 fundus images with different lesion types, ground truth marked by four experts. To quantitatively judge the performance, we compare in Tab. 3 the image level sensitivity of detecting if a certain lesion type is present in an image. The methods [54, 28, 21, 29] use supervised approaches on image level without reporting a localization.  propose an unsupervised approach to extract salient regions.  use a comparable setting to ours applying CAM  in a weakly-supervised way to highlight important regions. To decide if a lesion is detected,  suggest an overlap of 50% between proposed regions and ground truth. As our explanation masks are fine-grained and the ground truth is coarse, we compare using a 25% overlap and for completeness report a 50% overlap.
It is remarkable that FGVis performs comparable or outperforms fully supervised approaches which are designed to detect the presence of one lesion type. The strength of FGVis is especially visible in detecting RSD, as these small lesions only cover some pixels in the image. In Fig. A21 we show fundus images, ground truth and our predictions.
|Zhou et al.||94.4||-||-|
|Liu et al.||-||83.0||83.0||-|
|Haloi et al.||96.5||-||-|
|Mane et al.||-||-||-||96.4|
|Zhao et al. ||98.1||-||-|
|Gondal et al.||97.2||93.3||81.8||50|
|Ours (25% Overlap)||100||94.7||90.0||88.4|
|Ours (50% Overlap)||90.5||81.6||80.0||86.0|
We propose a method which generates fine-grained visual explanations in the image space using on a novel technique to defend adversarial evidence. Our defense does not introduce hyperparameters. We show the effectivity of the defense on different models, compare our explanations to other methods, and quantitatively evaluate the faithfulness. Moreover, we underline the strength in producing class discriminative visualizations and point to characteristics in explanations of a ResNet50. Due to the fine-grained nature of our explanations, we achieve remarkable results on a medical dataset. Besides, we show the usability of our approach to visually indicate a color bias in training data.
-  Julius Adebayo, Justin Gilmer, Ian Goodfellow, and Been Kim. Local explanation methods for deep neural networks lack sensitivity to parameter values. In Workshop at the International Conference on Learning Representations (ICLR), 2018.
-  Ankita Agrawal, Charul Bhatnagar, and Anand Singh Jalal. A survey on automated microaneurysm detection in diabetic retinopathy retinal images. In International Conference on Information Systems and Computer Networks (ISCON), pages 24–29. IEEE, 2013.
R. Arunkumar and P. Karthigaikumar.
Multi-retinal disease classification by reduced deep learning features.Neural Computing and Applications, 28(2):329–334, 2017.
-  Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja
Hansen, and Klaus-Robert Müller.
How to explain individual classification decisions.
Journal of Machine Learning Research, 11(Jun):1803–1831, 2010.
-  Chun-Hao Chang, Elliot Creager, Anna Goldenberg, and David Duvenaud. Explaining image classifiers by counterfactual generation. arXiv e-prints, page arXiv:1807.08024, Jul 2018.
-  Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N. Balasubramanian. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Winter Conference on Applications of Computer Vision (WACV), pages 839–847, 2018.
-  E. Colas, A. Besse, A. Orgogozo, B. Schmauch, N. Meric, and E. Besse. Deep learning approach for diabetic retinopathy screening. Acta Ophthalmologica, 94(S256), 2016.
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler,
Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
The cityscapes dataset for semantic urban scene understanding.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016.
-  Jorge Cuadros and George Bresnick. EyePACS: an adaptable telemedicine system for diabetic retinopathy screening. Journal of Diabetes Science and Technology, 3(3):509–516, 2009.
-  Piotr Dabkowski and Yarin Gal. Real time image saliency for black box classifiers. In Advances in Neural Information Processing Systems (NIPS), pages 6967–6976, 2017.
-  P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
-  Mengnan Du, Ninghao Liu, and Xia Hu. Techniques for interpretable machine learning. arXiv e-prints, page arXiv:1808.00033, Jul 2018.
-  Mengnan Du, Ninghao Liu, Qingquan Song, and Xia Hu. Towards explanation of dnn-based prediction with guided feature inversion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1358–1367, 2018.
-  EyePACS. https://www.kaggle.com/c/diabetic-retinopathy-detection. assessed on 2018-09-23, 2015.
-  EyePACS. https://www.kaggle.com/c/diabetic-retinopathy-detection/discussion/15617. assessed on 2018-09-23.
-  Ruth C. Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3429–3437, 2017.
-  Waleed M. Gondal, Jan M. Köhler, René Grzeszick, Gernot A. Fink, and Michael Hirsch. Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images. In IEEE International Conference on Image Processing (ICIP), pages 2069–2073, 2017.
-  Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
-  Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Journal of the American Medical Association (JAMA), 316(22):2402–2410, 2016.
-  Mrinal Haloi, Samarendra Dandapat, and Rohit Sinha. A gaussian scale space approach for exudates detection, classification and severity prediction. arXiv e-prints, page arXiv:1505.00737, May 2015.
-  Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (NIPS), pages 1135–1143, 2015.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
-  Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv e-prints, page arXiv:1503.02531, Mar 2015.
-  Tomi Kauppi, Valentina Kalesnykiene, Joni-Kristian Kamarainen, Lasse Lensu, Iiris Sorri, Asta Raninen, Raija Voutilainen, Hannu Uusitalo, Heikki Kälviäinen, and Juhani Pietilä. The DIARETDB1 diabetic retinopathy database and evaluation protocol. In British Machine Vision Conference (BMVC), pages 1–10, 2007.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1097–1105, 2012.
-  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv e-prints, page arXiv:1607.02533, Jul 2016.
-  Qing Liu, Beiji Zou, Jie Chen, Wei Ke, Kejuan Yue, Zailiang Chen, and Guoying Zhao. A location-to-segmentation strategy for automatic exudate segmentation in colour retinal fundus images. Computerized Medical Imaging and Graphics, 55:78–86, 2017.
-  Vijay M Mane, Ramish B Kawadiwale, and DV Jadhav. Detection of red lesions in diabetic retinopathy affected fundus images. In IEEE International Advance Computing Conference (IACC), pages 56–60, 2015.
-  Rowan McAllister, Yarin Gal, Alex Kendall, Mark Van Der Wilk, Amar Shah, Roberto Cipolla, and Adrian Vivian Weller. Concrete problems for autonomous vehicle safety: Advantages of Bayesian deep learning. In International Joint Conferences on Artificial Intelligence (IJCAI), 2017.
-  Weili Nie, Yang Zhang, and Ankit Patel. A theoretical explanation for perplexing behaviors of backpropagation-based visualizations. arXiv e-prints, page arXiv:1805.07039, May 2018.
-  Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. In British Machine Vision Conference (BMVC), 2018.
-  Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016.
-  Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
-  Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
-  Dasom Seo, Kanghan Oh, and Il-Seok Oh. Regional multi-scale approach for visually pleasing explanations of deep neural networks. arXiv e-prints, page arXiv:1807.11720, Jul 2018.
-  Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 3145–3153, 2017.
-  Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. International Conference on Learning Representations (ICLR), 2014.
-  Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv e-prints, page arXiv:1409.1556, Sep 2014.
-  Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv e-prints, page arXiv:1706.03825, Jun 2017.
-  Sharon D. Solomon, Emily Chew, Elia J. Duh, Lucia Sobrin, Jennifer K. Sun, Brian L. VanderBeek, Charles C. Wykoff, and Thomas W. Gardner. Diabetic retinopathy: a position statement by the American diabetes association. Diabetes care, 40(3):412–418, 2017.
-  Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. In International Conference on Learning Representations (ICLR), 2015.
-  Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 3319–3328, 2017.
-  Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
-  Daniel Shu Wei Ting, Carol Yim-Lui Cheung, Gilbert Lim, Gavin Siew Wei Tan, Nguyen D Quang, Alfred Gan, Haslina Hamzah, Renata Garcia-Franco, Ian Yew San Yeo, Shu Yen Lee, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Jama, 318(22):2211–2223, 2017.
-  Joanne WY Yau, Sophie L. Rogers, Ryo Kawasaki, Ecosse L. Lamoureux, Jonathan W. Kowalski, Toke Bek, Shih-Jen Chen, Jacqueline M. Dekker, Astrid Fletcher, Jakob Grauslund, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes care, 35(3):556–564, 2012.
-  Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 818–833, 2014.
-  Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-down neural attention by excitation backprop. In Proceedings of the European Conference on Computer Vision (ECCV), pages 543–559, 2016.
-  Quan-shi Zhang and Song-Chun Zhu. Visual interpretability for deep learning: A survey. Frontiers of Information Technology & Electronic Engineering, 19(1):27–39, 2018.
-  Yitian Zhao, Yalin Zheng, Yifan Zhao, Yonghuai Liu, Zhili Chen, Peng Liu, and Jiang Liu. Uniqueness-driven saliency analysis for automated lesion detection with applications to retinal diseases. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 109–118. Springer, 2018.
-  Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object Detectors Emerge in Deep Scene CNNs. arXiv e-prints, page arXiv:1412.6856, Dec 2014.
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.
Learning deep features for discriminative localization.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2921–2929, 2016.
-  Lei Zhou, Penglin Li, Qi Yu, Yu Qiao, and Jie Yang. Automatic hemorrhage detection in color fundus images based on gradual removal of vascular branches. In IEEE International Conference on Image Processing (ICIP), pages 399–403, 2016.
A1 Defending against Adversarial Evidence
Our method produces explanations based on evidence in the image and suppresses hallucination of adversarial evidence. Without our adversarial defense the optimization can produce an explanation for any class (i.e. even for a class visually not present in the image).
To illustrate this differently to the experiment reported in Sec. 3.1 (Tab. 1 and Fig. 3), we show an alternative version of the evaluation, only using a black image as input. Fig. A1 shows an explanation for the adversarial class iguana with and without defense. For Tab. A1 we create explanations for each of the 998 ImageNet classes, using always the same black input image. We omit the predicted class of the black image and the class of the starting condition (image zero mask). Without defense an explanation can always be generated due to hallucination of adversarial evidence. The results are comparable to the evaluation in the main paper.
A2 Implementation Details
Unless otherwise specified, the explanations are computed for the most-likely class using SGD with a learning rate of , running for iterations. To improve optimization and avoid instabilities, we initialize the masks
with noise sampled for each pixel from a uniform distribution. with for the generation and repression game and for the preservation and deletion game. We normalize the gradient using its maximum value to avoid large changes of individual mask pixels.
For the similarity metric we use the cross-entropy for the generation and preservation game and the negative probability for the deletion and repression game.
When computing an explanation for the most-likely class, we use a line-search for the parameter to determine its optimal value. Unless otherwise noted, we iteratively use equally spaced values between and and stop when the resulting most-likely class of shifts (deletion and repression game) or achieves the highest probability among all classes (preservation and generation game). We use images of the ImageNet  validation set and pre-trained model weights.
A comparison of resulting masks for different learning rates and values for GoogleNet computed with the deletion game are shown in Fig. A2.
A higher value causes sparser masks due to a higher weighting of the sparsity invoking part
within the loss function (Eq.2 and Eq. 3). Especially for higher values, the resulting masks are rather independent of the chosen learning rate of the SGD optimization.
A3 Qualitative Results
a3.1 Entropy of Reference Images
FGVis computes explanations by optimizing for a perturbed version of the input image . The perturbation is modelled via a removal operator [17, 14, 6, 11], which computes a weighted average between the image and a reference image , using a mask :
A good reference image should carry little information and lead to a model prediction with a high entropy, meaning, ideally all classes are assigned the same softmax score (see ’Maximum (1000 classes)’ in Tab. A2 for the resulting maximum entropy). To compare references, we report their entropy for different models in Tab. A2.
For all models except GoogleNet the zero image reference has the highest entropy. Interestingly, for the zero image reference, the more recent architectures (GoogleNet, ResNet50) have a lower entropy. This indicates that these architectures do not assign a roughly equally distributed softmax score to all classes (as AlexNet or VGG16).
As expected, an increasing noise level
for a Gaussian noise image as well as a decreasing standard deviation of the Gaussian blur filterreduces the entropy. Only GoogleNet does not fully follow this characteristic.
For comparison, we report the entropy for 1000 random ImageNet validation images for the different models.
Due to the high entropy as well as the low computational effort of a zero reference image, we choose this reference for FGVis.
|Gaussian noise image ()|
|Gaussian noise image ()|
|Blurred ImageNet image ()|
|Blurred ImageNet image ()|
|Maximum (1000 classes)|
random instances of each reference image. Gaussian noise images are generated by independently sampling for each pixel from a Gaussian distribution with zero-mean and a standard deviation of. The blurred ImageNet images are computed using a Gaussian blur filter with a standard deviation of . For all random references we report the mean standard deviation of the entropy.
a3.2 Class Discriminative / Fine-Grained
In Fig. A3 and Fig. A4 we show additional explanation masks for images containing two distinct objects. The objects are chosen from highly different categories to ensure little overlapping evidence. The explanations are computed using the deletion game, which generates the most pleasing class-discriminative explanations, and GoogleNet.
Note that FGVis discriminates well even if the two objects partially overlap. The figures additionally highlight the ability of FGVis to generate fine-grained explanations.
To determine we use for the most-likely class the strategy as described in Sec. A2. For the second class is optimized to significantly drop the softmax score of this class.
a3.3 Investigating Biases of Training Data
Learned objects. The coexistence of objects in images often results in a learned bias. In Fig. A5, we visualize such a bias for GoogleNet trained on ImageNet.
Sports equipment like hockey pucks or ping-pong balls frequently appear in combination with players. This bias is learned by the neural network and results in explanations that also contain pixels belonging to the players. Without deleting these pixels, the deletion game is not able to shift the class of the images.
Learned color. We quantitatively verify the color bias reported in Sec. 4.3 and show the classes of ImageNet which are most and least affected by swapping the color in Tab. A3. We swap each of the three color channels BGR to either RBG or GRB and calculate the ratio of maintained true classifications on the validation data after the swap.
Fig. A6 shows explanations for the class school bus computed using the preservation game for VGG. The yellow color, also visible in the original images (Fig. A7), is dominant in most of the explanations.
Fig. A8 shows explanations for the class minivan computed using the preservation game for VGG. The original color of the car is not consistently preserved. Especially for white or grey cars (original images in Fig. A9) the visible color in the explanation is reduced to a greenish-blue color.
|ID||Class name||#Images||Avg. RBG, GRB||RBG||GRB|
|963||pizza, pizza pie||35|
|962||meat loaf, meatloaf||29|
|211||vizsla, Hungarian pointer||35|
|11||goldfinch, Carduelis carduelis||48|
|934||hotdog, hot dog, red hot||40|
|218||Welsh springer spaniel||39|
|191||Airedale, Airedale terrier||37|
|263||Pembroke, Pembroke Welsh corgi||41|
|528||dial telephone, dial phone||36|
|47||African chameleon, Chamaeleo chamaeleon||40|
|302||ground beetle, carabid beetle||27|
|717||pickup, pickup truck||28|
|829||streetcar, tram, tramcar, trolley, …||41|
|916||web site, website, internet site, site||47|
|190||Sealyham terrier, Sealyham||39|
|545||electric fan, blower||37|
a3.4 Comparison of Networks
In Fig. A10 and Fig. A11 we compare the mask and explanation for four network architectures (GoogleNet, VGG16, AlexNet, ResNet50) using the deletion game. Respectively, in Fig. A12 and Fig. A13 we use the preservation game for the same comparison.
For all settings the explanations of ResNet50 and VGG16 are more dense, meaning more pixels have to be deleted/preserved to change/preserve the class prediction. This could be an indicator that these models are more robust, though, a detailed explanation would require further research. Besides, the grid-like pattern for the explanations from ResNet50, described in Sec. 4.1 are visible.
For VGG16 we have observed that the pixels at the image edge are in many cases highlighted in the explanations. Furthermore, VGG16 shows pronounced edges in the explanation compared to the other networks.
a3.5 Comparison of Games
The resulting explanations for the repression and deletion game are qualitatively similar. The similarity among the two games is due to both using the same optimization with only a different starting condition for the repression vs. for the deletion game. The same observation holds for the generation / preservation game.
The explanations of the repression and deletion game are more sparse compared to the generation / preservation game. This is most likely due to the fact that only small parts of the image need to be suppressed to change the model output (e.g. shifting one breed of dog to another), though, to evoke a certain model output one needs to create sufficient amount of evidence for this class.
During the optimization only class pixels containing evidence towards the target class need to be changed for the generation and deletion game. After optimization most of the mask values stay zero for the generation game and one for the deletion game. The optimized masks are thus similar to its starting conditions.
Vice versa, the opposite holds for the preservation and repression game.
a3.6 Further Examples
A4 Quantitative Results
a4.1 Faithfulness of Explanations
To evaluate the faithfulness of our approach, we use the deletion metric of Petsiuk et al. . This metric measures how the removal of evidence affects the prediction of the used model. The metric assumes that an importance map is given, which ranks all image pixels with respect to their evidence for the predicted class (i.e. the most-likely class). We use the mean mask (see Sec. A3.5) as the pixel-wise importance map. The mean mask is computed for all images in the ImageNet validation dataset using the deletion game with a learning rate of and a line-search to determine the value. We iteratively use 4 equally spaced values between and and stop when , where is the softmax score of class given the explanation and the corresponding score given the image.
Using the importance map, the deletion curve is generated by successively removing pixels from the input image according to their importance and measuring the resulting probability of the class (see Fig. 19(c)). The removed pixels are set to zero, as proposed in Petsiuk et al. . The fraction of removed pixels is increased in increments of for the first steps and in increments of for the remaining steps. In Fig. 19(b), we visualize for an example image the binary masks used to successively set pixels to zero. For a clearer illustration, we reduced the number of deletion steps in this figure. The deletion metric is computed by measuring the area under the curve AUC of the deletion curve (see Fig. 19(c)) using the trapezoidal rule.
a4.2 Visual Explanation for Medical Images
Background of the disease: As people with diabetes have a high prevalence for RDR , a frequent retinal screening is recommended and deep learning algorithms have been successfully developed to classify fundus images (, , , ). The black box character of these algorithms can be reduced by visual explanation techniques as shown in .
Of the publicly available 88,702 images  from EyePACS , we us 80% for training and 20% for validation for a classifier with binary outcome (referable diabetic retinopathy (RDR) vs. non-RDR) which is later used for the weakly-supervised localization. We use a similar setup as in  to train the binary classifier (RDR vs. non-RDR).
for classifying retinal images. We use leaky ReLUs as non-linearities and include batch normalization.
The DiaretDB1 dataset  used to evaluate the weakly-supervised localization is a dataset of 89 color fundus images collected at the Kuopio University Hospital, Finland. All images have a resolution of 1500x1152 pixels and are scaled to the input dimension of the model.
The dataset is ground truth marked by four experts. As proposed in  we consider pixels as lesions if at least three experts have agreed.
We use FGVis with a fixed and a learning rate of stopping if the softmax score for RDR falls below 10% with a maximum of 500 iterations.
are binarized for better visualization and to be able to quantitatively report the sensitivity (see Tab.3). Values greater or equal than 4% of the maximum are set to one, the remaining pixels to zero. The predicted pixels in the fine-grained masks map to the ground truth. Note that FGVis detects these pixels as they are the important ones to be deleted to reduce the softmax score for RDR.
A medical expert would also look at mutations in the optic disk or blood vessels which additionally are an indicator for the disease . These mutations are also highlighted by our method. They are not labelled in the ground truth markings leading to visual false positives (FPs).
The strength of FGVis to visualize fine-grained structures can be seen in the detection of red small dots (microaneurysm) which are the earliest sign of diabetic retinopathy . As these often merely cover some pixels in the image, it is hard to detect them (zooming in Fig. A21 is necessary to spot these).