Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

12/03/2020 ∙ by Shir Gur, et al. ∙ 38

Neural network visualization techniques mark image locations by their relevancy to the network's classification. Existing methods are effective in highlighting the regions that affect the resulting classification the most. However, as we show, these methods are limited in their ability to identify the support for alternative classifications, an effect we name the saliency bias hypothesis. In this work, we integrate two lines of research: gradient-based methods and attribution-based methods, and develop an algorithm that provides per-class explainability. The algorithm back-projects the per pixel local influence, in a manner that is guided by the local attributions, while correcting for salient features that would otherwise bias the explanation. In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization, and not just the predicted label. Remarkably, the method obtains state of the art results in benchmarks that are commonly applied to gradient-based methods as well as in those that are employed mostly for evaluating attribution methods. Using a new unsupervised procedure, our method is also successful in demonstrating that self-supervised methods learn semantic information.



There are no comments yet.


page 24

page 25

page 26

page 27

page 28

page 29

page 30

page 31

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) (b) (c) (d) (e) (f)
Figure 1: Visualizations by our method for a pre-trained VGG-19. (a,d) input images. (b,e) the heatmap generated for the top label. (c,f) same for the 2nd highest prediction.

The most common class of explainability methods for image classifiers visualize the reason behind the classification of the network as a heatmap. These methods can make the rationale of the decision accessible to humans, leading to higher confidence in the ability of the classifier to focus on the relevant parts of the image, and not on spurious associations, and help debug the model. In addition to the human user, the “computer user” can also benefit from such methods, which can seed image segmentation techniques 

(Ahn et al., 2019; Huang et al., 2018; Wang et al., 2019; Hoyer et al., 2019), or help focus generative image models, among other tasks.

The prominent methods in the field can be divided into two families: (i) gradient-based maps, which consider the gradient signal as it is computed by the conventional backpropagation approach 

(Sundararajan et al., 2017; Smilkov et al., 2017; Srinivas and Fleuret, 2019; Selvaraju et al., 2017) and (ii) relevance propagation methods (Bach et al., 2015; Nam et al., 2019; Gu et al., 2018; Iwana et al., 2019), which project high-level activations back to the input domain, mostly based on the deep Taylor decomposition by Montavon et al. (2017). The two families are used for different purposes, and are evaluated by different sets of experiments and performance metrics. As we show in Sec. 5, both types of methods have complementary sets of advantages and disadvantages: gradient based methods, such as Grad-CAM, are able to provide a class specific visualization for deep layers, but fail to do so for the input image, and also provide a unilateral result. In contrast, attribution based methods, excel in visualizing at the input image level, and have bipartite results, but lack in visualizing class specific explanations.

We present a novel method for class specific visualization of deep image recognition models. The method is able to overcome the limitations of the previous work, by combining ideas from both families of methods, and accumulating across the layers both gradient information, and relevance attribution. The method corrects for what we term the saliency bias. This bias draws the attention of the network towards the salient activations, and can prevent visualizing other image objects. This has led to the claims that visualization methods mimic the behavior of edge detectors, and that the generated heatmaps are largely independent of the network weights (Adebayo et al., 2018).

There are two different questions that explainability method often tackle: (i) which pixels affect classification the most, and (ii) which pixels are identified as belonging to the predicted class. Our method answers the second one and outperforms the relevant literature methods in multiple ways. First, the locations we identify are much more important to the classification outcome than those of the baselines, when looking inside the region of a target class. Second, we are able to identify regions of multiple classes in each image, and not just the prominent class, see Fig. 1. Our method greatly outperforms recent multi-class work (Gu et al., 2018; Iwana et al., 2019).

The main contributions of our work are: (1) a novel explainability method, that combines both gradient and attribution techniques, as well as a new attribution guided factorization technique for extracting informative class-specific attributions from the input feature map and its gradients. (2) we point to and correct, for the first time, for the saliency bias. This bias hindered all previous attempts to explain more than the top decision, and is the underlying reason for the failures of explainability methods demonstrated in the literature. (3) state-of-the-art performance in both negative perturbation and in segmentation-based evaluation. The former is often used to evaluate attribution methods, and the latter is often used to evaluate gradient-based methods. (4) using a novel procedure, we show for the first time as far as we can ascertain, that self-supervised networks implicitly learn semantic segmentation information.

2 Related work

Many explainability methods belong to one of two classes: attribution and gradient methods. Both aim to explain and visualize neural networks, but differ in their underlying concepts. A summary of these methods with their individual properties is presented in Tab. 1.

Methods outside these two classes include those that generate salient feature maps Dabkowski and Gal (2017); Simonyan et al. (2013); Mahendran and Vedaldi (2016); Zhou et al. (2016); Zeiler and Fergus (2014); Zhou et al. (2018), Activation Maximization Erhan et al. (2009) and Excitation Backprop Zhang et al. (2018). Extremal Perturbation methods Fong et al. (2019); Fong and Vedaldi (2017) are applicable to black box models, but suffer from high computational complexity. Shapley-value based methods Lundberg and Lee (2017), despite their theoretical appeal, are known to perform poorly in practice. Therefore, while we compare empirically with several Shaply and perturbation methods, we focus on the the newer gradient and attribution methods.

Attribution propagation methods follow the Deep Taylor Decomposition (DTD) method, of Montavon et al. (2017), which decompose the network classification decision into the contributions of its input elements. Following this line of work, methods, such as Layer-wise Relevance Propagation (LRP)by Bach et al. (2015)

, use DTD to propagate relevance from the predicated class, backward, to the input image, in neural networks with a rectified linear unit (ReLU) non-linearity. The PatterNet and PaternAttribution 

Kindermans et al. (2017) methods yield similar results to LRP. A disadvantage of LRP, is that it is class agnostic, meaning that propagating from different classes yields the same visualization. Contrastive-LRP (CLRP) by Gu et al. (2018) and Softmax-Gradient-LRP (SGLRP) by Iwana et al. (2019) use LRP to propagate results of the target class and, in contrast to all other classes, in order to produce a class specific visualization. Nam et al. (2019) presented RAP, a DTD approach that partitions attributions to positive and negative influences, following a mean-shift approach. This approach is class agnostic, as we demonstrate in Sec. 5

. Deep Learning Important FeaTures (DeepLIFT) 

Shrikumar et al. (2017)

decomposes the output prediction, by assigning the differences of contribution scores between the activation of each neuron to its reference activation.

Gradient based

methods use backpropagation to compute the gradients with respect to the layer’s input feature map, using the chain rule. The Gradient*Input method by 

Shrikumar et al. (2016) computes the (signed) partial derivatives of the output with respect to the input, multiplying it by the input itself. Integrated Gradients Sundararajan et al. (2017), similar to Shrikumar et al. (2016)

, computes the multiplication of the inputs with its derivatives, only they compute the average gradient while performing a linear interpolation of the input, according to some baseline that is defined by the user. SmoothGrad by 

Smilkov et al. (2017), visualize the mean gradients of the input, while adding to the input image a random Gaussian noise at each iteration. The FullGrad method by Srinivas and Fleuret (2019) suggests computing the gradients of each layer’s input, or bias, followed by a post-processing operator, usually consisting of the absolute value, reshaping to the size of the input, and summing over the entire network. Where for the input layer, they also multiply by the input image before post-processing. As we show in Sec. 5, FullGrad produces class agnostic visualizations. On the other hand, Grad-CAM by Selvaraju et al. (2017) is a class specific approach, combining both the input features, and the gradients of a network’s layer. This approach is commonly used in many applications due to this property, but its disadvantage rests in the fact that it can only produce results for very deep layers, resulting in coarse visualization due to the low spatial dimension of such deep layers.

Int. Smooth LRP LRP Full Grad RAP CLRP SGLRP Ours
Grad Grad Grad CAM
Class Specific
Input Domain
Table 1: Properties of various visualization methods, which can be largely divided into gradient based and attribution based. Most methods are not class specific, and except for Grad-CAM, all methods project all the way back to the input image.

3 Propagation methods

We define the building blocks of attribution propagation and gradient propagation that are used in our method.

Attribution Propagation: Let , be the input feature map and weights of layer , respectively, where is the layer index in a network, consisting of layers. In this field’s terminology, layer is downstream of layer , layer processes the input, while layer produces the final output.

Let to be the result of applying layer on . The relevancy of layer is given by and is also known as the attribution.

Definition 1

Montavon et al. (2017) The generic attribution propagation rule

is defined, for two tensors,

and , as:


Typically, is related to the layer’s input , and to its weights . LRP Binder et al. (2016) can be written in this notation by setting and , where for a tensor . Note that Def. 1 satisfies the conservation rule:



be the output vector of a classification network with

classes, and let represent the specific value of class . LRP defines to be the a zeros vector, except for index , where . Similarly, the CLRP Gu et al. (2018) and SGLRP Iwana et al. (2019) methods calculate the difference between two LRP results, initialized with two opposing for “target” and “rest”, propagating relevance from the “target” class, and the “rest” of the classes, e.g.CLRP is defined as:


where , , and is a normalization term .

-Shift: Def. 1 presented a generic propagation rule that satisfies the conservation rule in Eq. 2. However, in many cases, we would like to add a residual signal denoting another type of attribution. The -Shift corrects for the deviation from the conservation rule.

Definition 2

Given a generic propagation result , following Eq. 2, and a residual tensor , the -Shift is defined as follows:


Note that we divide the sum of the residual signal by the number of non-zero neurons. While not formulated this way, the RAP method Nam et al. (2019) employs this type of correction defined in Def. 2.

Gradient Propagation: The propagation in a neural network is defined by the chain rule.

Definition 3

Let be the loss of a neural network. The input feature gradients, of layer , with respect to are defined by the chain rule as follows:


Methods such as FullGrad Srinivas and Fleuret (2019) and SmoothGrad Smilkov et al. (2017) use the raw gradients, as defined in Eq.5, for visualization. Grad-CAM Selvaraju et al. (2017), on the other hand, performs a weighted combination of the input feature gradients, in order to obtain a class specific visualization, defined as follows:


where is the specific value of the gradient -channel tensor at channel and pixel , and is the entire channel, which is a matrix of size .

Guided Factorization: The explanation should create a clear separation between the positively contributing regions, or the foreground, and the negatively contributing ones, referred to as the background. This is true for both the activations and the gradients during propagation. Ideally, the relevant data would be partitioned into two clusters — one for positive contributions and one for the negative contributions. We follow the partition problem Yuan et al. (2015); Gao et al. (2016), in which the data is divided spatially between positive and negative locations, in accordance with the sign of a partition map .

Given a tensor , we re-write it as a matrix in the form of . We compute the Heaviside function of using a function: . The matrix is a positive-matrix, and we consider the following two-class non-negative matrix factorization , where contains the spatial mixing weights, and the representative matrix , defined by the mean of each class in the data tensor based on the assignment of :

where , is the channel dimension and denotes the Hadamard product.

We estimate the matrix

of positive weights by least squares , where . Combining the foreground weights with the background weights into the same axis is done by using both negative and positive values, leading to the following operator: , where is a function that takes and , as inputs. We further normalize using , to allow multiple streams to be integrated together:


4 The Integrated Method

Let be a multiclass CNN classifier ( labels), and be the input image. The network outputs a score vector , obtained before applying the operator. Given any target class , our goal is to explain where (spatially) in lies the support for class . The method is composed of two streams, gradients and attribution propagation. Each step, we use the previous values of the two, and compute the current layer’s input gradient and attribution.

There are three major components to our method, (i) propagating attribution results using Def. 1, (ii) factorizing the activations and the gradients in a manner that is guided by the attribution, and (iii) performing attribution aggregation and shifting the values, such that the conservation rule is preserved. The shift splits the neurons into those with a positive and negative attribution. The complete algorithm is listed as part of the supplementary material.

4.1 Initial Attribution Propagation

As shown in Gu et al. (2018); Iwana et al. (2019), the use of different initial relevance can result in improved results. Let and be the output and input class attribution maps of layer , respectively. We employ the following initial attribution for explaining decision . Let

be the output vector of the classification network (logits), we compute the initial attribution



In this formulation, we replace the psuedo-probabilities of vector

with another vector, in which class that we wish to provide an explanation for is highlighted, and the rest of the classes are scored by the closeness of their assigned probability to that of . This way, the explanation is no longer dominated by the predicted class.

Input LRP FullGrad Grad RAP CLRP SGLRP Ours
Image CAM
(c) Dog Cat
Figure 2: (a) A comparison between methods for VGG-19. (b) Same for ResNet-50. (c) Visualization of two different classes for VGG-19. Many more results can be found in the supplementary material.
(a) (b)
Figure 3:

Negative perturbation results on the ImageNet validation set of

(a) the predicted and (b) the target class. Show is the change in accuracy, when removing a fraction of the image according to the attribution value, starting from lowest to highest.
Int. Smooth LRP LRP Full Grad RAP CLRP SGLRP Ours
Grad Grad Grad CAM
Predicted 10.4 12.0 26.8 18.8 32.1 37.8 38.7 29.8 29.9 38.9
Target 10.5 12.1 26.8 18.8 32.4 39.4 38.7 31.6 32.4 40.0
Table 2: Area Under the Curve (AUC) for the two negative perturbation tests, showing results for predicted and target class. The class-agnostic methods either perform worse or experience insignificant change on the target class test.

Gradient Meaningful Int. Smooth LRP LRP Full Grad RAP CLRP SGLRP Ours
Dataset Metric SHAP Perturbation Grad Grad Grad CAM
ImageNet Pixel Acc. 49.9 29.6 72.7 70.1 74.9 29.6 75.8 66.5 78.8 53.3 54.5 79.9
mAP 50.0 45.4 67.4 65.1 69.9 47.5 70.6 62.4 73.7 54.6 55.1 76.7
VOC’12 Pixel Acc. 9.3 10.2 70.1 69.9 66.6 70.8 26.9 52.6 72.8 72.8 73.1 77.5
mAP 35.2 22.7 34.9 34.3 40.8 37.0 19.7 41.7 39.6 37.9 32.6 45.9
Table 3:

Quantitative segmentation results on (a) ImageNet and (b) PASCAL-VOC 2012.

4.2 Class Attribution Propagation

As mentioned above, there are two streams in our method. The first propagates the gradients, which are used for the factorization process we discuss in the next section, and the other is responsible for propagating the resulting class attribution maps , using DTD-type propagations. This second stream is more involved, and the computation of has multiple steps.

The first step is to propagate through following Eq. 1, using two variants. Both variants employ as the tensor to be propagated, but depending on the setting of and , result in different outcomes. The first variant considers the absolute influence , defined by:


The second variant computes the input-agnostic influence , following Eq. 1:


where is an all-ones tensor of the shape of . We choose the input-agnostic propagation because features in shallow layers, such as edges, are more local and less semantic. It, therefore, reduces the sensitivity to texture.

4.3 Residual Update

As part of the method, we compute in addition to , the factorization of both the input feature map of layer and its gradients. This branch is defined by the chain rule in Eq. 5, where we now consider . The factorization results in foreground and background partitions, using guidance from . This partition follows the idea of our attribution properties, where positive values are part of class , and negatives otherwise. We, therefore, employ the following attribution guided factorization (Eq. 7 with respect to and ) as follows:


note that we only consider the positive values of the factorization update, and that the two results are normalized by their maximal value. Similarly to Sundararajan et al. (2017); Selvaraju et al. (2017), we define the input-gradient interaction:


The residual attribution is then defined by all attributions other than :


We observe that both and are affected by the input feature map, resulting in the saliency bias effect (see Sec. 1). As a result, we penalize their sum according to , in a manner that emphasises positive attribution regions.

We note that , and the residual needs to be compensated for, in order to preserve the conservation rule. Therefore, we perform a -shift as defined in Def. 2, resulting in the final attribution:


4.4 Explaining Self-Supervised Learning (SSL)

SSL is proving to be increasingly powerful and greatly reduces the need for labeled samples. However, no explainability method was applied to verify that these models, which are often based on image augmentations, do not ignore localized image features.

Since no label information is used, we rely on the classifier of the self-supervised task itself, which has nothing to do with the classes of the datasets. We consider for each image, the image that is closest to it in the penultimate layer’s activations. We then subtract the logits of the self supervised task of the image to be visualized and its nearest neighbor, to emphasize what is unique to the current image, and then use explainability methods on the predicted class of the self-supervised task. See supplementary for more details of this novel procedure.

5 Experiments

Qualitative Evaluation: Fig. 2(a,b) present sample visualization on a representative set of images for networks trained on ImageNet, using VGG19 and ResNet-50, respectively. In these figures, we visualize the top-predicted class. More results can be found in the supplementary, including the output for the rest of the methods, which are similar to other methods and are removed for brevity.

The preferable visualization quality provided by our method is strikingly evident. One can observe that (i) LRP, FullGrad and Grad-CAM output only positive results, wherein LRP edges are most significant, and in all three, the threshold between the object and background is ambiguous. (ii) CLRP and SGLRP, which apply LRP twice, have volatile outputs. (iii) RAP is the most consistent, other than ours, but falls behind in object coverage and boundaries. (iv) Our method produces relatively complete regions with clear boundaries between positive and negative regions.

In order to test whether each method is class-agnostic or not, we feed the classifier images containing two clearly seen objects, and propagate each object class separately. In Fig. 2(c) we present results for a sample image. As can be seen, LRP, FullGrad and RAP output similar visualizations for both classes. Grad-CAM, on the other hand, clearly shows a coarse region of the target class, but lacks the spatial resolution. CLRP and SGLRP both achieve class separation, and yet, they are highly biased toward image edges, and do not present a clear separation between the object and its background. Our method provides the clearest visualization, which is both highly correlated with the target class, and is less sensitive toward edges. More samples can be found in the supplementary.

Quantitative Experiments: We employ two experiment settings that are used in the literature, negative perturbation and segmentation tests. We evaluate our method using three common datasets: (i) the validation set of ImageNet Russakovsky et al. (2015) (ILSVRC) 2012, consisting of 50K images from 1000 classes, (ii) an annotated subset of ImageNet called ImageNet-Segmentation Guillaumin et al. (2014), containing 4,276 images from 445 categories, and (iii) the PASCAL-VOC 2012 dataset, depicting 20 foreground object classes and one background class, and containing 10,582 images for training, 1449 images for validation and 1,456 images for testing.

Negative Perturbation Experiments: The negative perturbation test is composed of two stages, first, a pre-trained network is used to generate the visualizations of the ImageNet validation set. In our experiments, we use the VGG-19 architecture, trained on the full ImageNet training set. Second, we mask out an increasing portion of the image, starting from lowest to highest values, determined by the explainability method. At each step, we compute the mean accuracy of the pre-trained network. We repeat this test twice: once for the explanation of the top-1 predicted class, and once for the ground truth class. The results are presented in Fig. 3 and Tab. 3. As can be seen, our method achieves the best performance across both tests, where the margin is highest when removing of the pixels.

(a) (b) Figure 4: Quantitative results for self-supervised methods in the segmentation task. (a) Comparison of different explainability methods for SCAN, SeLa and RotNet. (b) Per-layer performance of RotNet using linear-probes.
Ours RAP Method () () w/o () () w/o Supervised 42.4 42.1 41.2 41.6 41.2 40.0  SeLa 37.8 37.4 36.9 37.2 36.3 34.5 SCAN 38.1 37.9 37.0 37.4 36.8 34.8
Table 4: AUC for negative perturbation tests for self-supervised methods - SeLa and SCAN. () is our method; () is an alternative that adds instead of subtracts, w/o does not consider the neighbor at all. RAP, which is the best baseline in Tab. 3, is used as a baseline.
Ours Only without Grad-CAM
AUC 38.9 38.2 34.6 37.5 37.1 37.2 37.5 37.1
Table 5: AUC results in the negative perturbation test for variations of our method.

Semantic Segmentation Metrics: To evaluate the segmentation quality obtained by each explainability method, we compare each to the ground truth segmentation maps of the ImageNet-Segmentation dataset, and the PASCAL-VOC 2012, evaluating by pixel-accuracy and mean average-precision. We follow the literature benchmarks for explainability that are based on labeled segments Nam et al. (2019). Note that these are not meant to provide a full weakly-supervised segmentation solution, which is often obtained in an iterative manner. Rather, its goal is to demonstrate the ability of each method without follow-up training. For the first dataset, we employed the pre-trained VGG19 classifier trained on ImageNet training set, and compute the explanation for the top-predicted class (we have no access to the ground truth class) and compare it to the ground truth mask provided in the dataset. For the second, we trained a multi-label classifier on the PASCAL-VOC 2012 training set, and consider labels with a probability larger than 0.5 to extract the explainability maps. For methods that provide both positive and negative values (Gradient SHAP, LRP, RAP, CLRP, SGLRP, and ours), we consider the positive part as the segmentation map of that object. For methods that provide only positive values (Integrated Grad, Smooth Grad, Full Grad, GradCAM, LRP, Meaningful Perturbation), we threshold the obtained maps at the mean value to obtain the segmentation map. Results are reported in Tab. 3, demonstrating a clear advantage of our method over all nine baseline methods, for all datasets and metrics. Other methods seem to work well only in one of the datasets or present a trade-off between the two metrics.

Explainability for Self-Supervised Models: We use three state-of-the-art SSL models: ResNet-50 trained with either SCAN Van Gansbeke et al. (2020) or SeLa Asano et al. (2019b), and an Alexnet by Asano et al. (2019a), which we denote as RotNet.

Fig. 4(a) shows the segmentation performance for the ImageNet ground truth class of the completely unsupervised SSL methods (color) using different explainability methods (shape). For all models our explainability method outperforms the baselines in both mAP and pixel accuracy, except for RotNet where the mAP is considerably better and the pixel accuracy is slightly lower. Fig. 4(b) shows the increase of segmentation performance for RotNet with our method, as we visualize deeper layers of SSL Alexnet, using a supervised linear post-training as proposed by Asano et al. (2019a). This finding is aligned with the classification results by a single linear layer of Asano et al. (2019a), which improve with the layer index.

Tab. 5 compares RAP (which seems to be the strongest baseline) and our method in the predicted SSL class (not ImageNet) negative perturbation setting. Evidently, our method has superior performance. SSL results also seem correlated with the fully supervised ones (note that the architecture and the processing of the fully connected layer is different from the one used in Tab. 3). Two alternatives to our novel SSL procedure are also presented: in one, the difference from the neighbor is replaced with a sum. In the other, no comparison to the neighbor takes place. Our procedure for SSL explainability is superior for both our method and RAP.

Ablation Study: By repeatedly employing normalization, our method is kept parameter-free. In Tab. 5, we present negative perturbation results for methods that are obtained by removing one components out of our complete method. We also present the results of a similar method in which the guided attribution based residual-term is replaced by a GradCam term. As can be seen, each of these modifications damages the performance to some degree. Without any residual term, the method is slightly worse than RAP, while a partial residual term further hurts performance. In the supplementary material, we show visual results of the different components and variations of our method and present observations on the contribution of each part to the final outcome.

6 Conclusions

Explainability plays a major rule in debugging neural networks, creating trust in the predictive capabilities of networks beyond the specific test dataset, and in seeding downstream methods that analyze the images spatially. Previous visualization methods are either class-agnostic, low-resolution, or neglect much of the object’s region, focusing on edges. The separation between relevant and irrelevant image parts provided by the previous methods is also often blurry. In this work, we present a novel explainability method that outputs class-dependent explanations that are clearer and more exact than those presented by the many existing methods tested. The new method is based on combining concepts from the two major branches of the current literature: attribution methods and gradient methods. This combination is done, on equal grounds, through the usage of a non-negative matrix factorization technique that partitions the image into foreground and background regions. Finally, we propose a novel procedure for evaluating the explainability of SSL methods.


This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant ERC CoG 725974). The contribution of the first author is part of a Ph.D. thesis research conducted at Tel Aviv University.


  • J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim (2018) Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, pp. 9505–9515. Cited by: §1.
  • J. Ahn, S. Cho, and S. Kwak (2019)

    Weakly supervised learning of instance segmentation with inter-pixel relations


    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 2209–2218. Cited by: §1.
  • Y. M. Asano, C. Rupprecht, and A. Vedaldi (2019a) A critical analysis of self-supervision, or what we can learn from a single image. arXiv preprint arXiv:1904.13132. Cited by: §5, §5.
  • Y. M. Asano, C. Rupprecht, and A. Vedaldi (2019b) Self-labelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371. Cited by: §5.
  • S. Bach, A. Binder, G. Montavon, F. Klauschen, K. Müller, and W. Samek (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10 (7), pp. e0130140. Cited by: §1, §2.
  • A. Binder, G. Montavon, S. Lapuschkin, K. Müller, and W. Samek (2016) Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, pp. 63–71. Cited by: §3.
  • P. Dabkowski and Y. Gal (2017) Real time image saliency for black box classifiers. In Advances in Neural Information Processing Systems, pp. 6970–6979. Cited by: §2.
  • D. Erhan, Y. Bengio, A. Courville, and P. Vincent (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341 (3), pp. 1. Cited by: §2.
  • R. C. Fong and A. Vedaldi (2017) Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437. Cited by: §2.
  • R. Fong, M. Patrick, and A. Vedaldi (2019) Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2950–2958. Cited by: §2.
  • M. Gao, H. Chen, S. Zheng, and B. Fang (2016) A factorization based active contour model for texture segmentation. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 4309–4313. Cited by: §3.
  • J. Gu, Y. Yang, and V. Tresp (2018) Understanding individual decisions of cnns via contrastive backpropagation. In Asian Conference on Computer Vision, pp. 119–134. Cited by: §1, §1, §2, §3, §4.1.
  • M. Guillaumin, D. Küttel, and V. Ferrari (2014) Imagenet auto-annotation with segmentation propagation. International Journal of Computer Vision 110 (3), pp. 328–348. Cited by: §5.
  • L. Hoyer, M. Munoz, P. Katiyar, A. Khoreva, and V. Fischer (2019) Grid saliency for context explanations of semantic segmentation. In Advances in Neural Information Processing Systems, pp. 6462–6473. Cited by: §1.
  • Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7014–7023. Cited by: §1.
  • B. K. Iwana, R. Kuroki, and S. Uchida (2019)

    Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation

    arXiv preprint arXiv:1908.04351. Cited by: §1, §1, §2, §3, §4.1.
  • P. Kindermans, K. T. Schütt, M. Alber, K. Müller, D. Erhan, B. Kim, and S. Dähne (2017) Learning how to explain neural networks: patternnet and patternattribution. arXiv preprint arXiv:1705.05598. Cited by: §2.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pp. 4765–4774. Cited by: §2.
  • A. Mahendran and A. Vedaldi (2016) Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision 120 (3), pp. 233–255. Cited by: §2.
  • G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K. Müller (2017) Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65, pp. 211–222. Cited by: §1, §2, Definition 1.
  • W. Nam, S. Gur, J. Choi, L. Wolf, and S. Lee (2019) Relative attributing propagation: interpreting the comparative contributions of individual units in deep neural networks. arXiv preprint arXiv:1904.00605. Cited by: §1, §2, §3, §5.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §5.
  • R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §1, §2, §3, §4.3.
  • A. Shrikumar, P. Greenside, and A. Kundaje (2017) Learning important features through propagating activation differences. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    pp. 3145–3153. Cited by: §2.
  • A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje (2016) Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Cited by: §2.
  • K. Simonyan, A. Vedaldi, and A. Zisserman (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §2.
  • D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825. Cited by: §1, §2, §3.
  • S. Srinivas and F. Fleuret (2019) Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems, pp. 4126–4135. Cited by: §1, §2, §3.
  • M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3319–3328. Cited by: §1, §2, §4.3.
  • W. Van Gansbeke, S. Vandenhende, S. Georgoulis, M. Proesmans, and L. Van Gool (2020) SCAN: learning to classify images without labels. In European Conference on Computer Vision (ECCV), Cited by: §5.
  • Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen (2019) Self-supervised scale equivariant network for weakly supervised semantic segmentation. arXiv preprint arXiv:1909.03714. Cited by: §1.
  • J. Yuan, D. Wang, and A. M. Cheriyadat (2015) Factorization-based texture segmentation. IEEE Transactions on Image Processing 24 (11), pp. 3488–3497. Cited by: §3.
  • M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Cited by: §2.
  • J. Zhang, S. A. Bargal, Z. Lin, J. Brandt, X. Shen, and S. Sclaroff (2018) Top-down neural attention by excitation backprop. International Journal of Computer Vision 126 (10), pp. 1084–1102. Cited by: §2.
  • B. Zhou, D. Bau, A. Oliva, and A. Torralba (2018) Interpreting deep visual representations via network dissection. IEEE transactions on pattern analysis and machine intelligence. Cited by: §2.
  • B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §2.

Appendix A Code

The code is written in PyTorch, and is provided as part of the supplementary material, including a

Jupyter notebook with examples. Fig. 2,2 show screenshots from the notebook.

Figure 1: Visualization of two different classes for VGG-19.
Figure 2: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 1: Visualization of two different classes for VGG-19.

Appendix B Algorithm

Our method is listed in Alg. 1.
Note that for the linear layers of the network, we compute the residual using only in the following way:


We note that and are reshaped into their previous (2D) form.

1: Neural network model, Input images, Target class.
2: Forward-pass, save intermediate feature maps
5: Initial Attribution - First Linear layer
6:for linear layers,  do
7:      Absolute influence
8:     if next layer is 2D then
9:         Reshape and to the previous 2D form
11:          Residual
12:     else
14:      Shifting by the residual
15:for convolution layers do
16:     Compute ,
17:      Input agnostic attribution
18:      Factorization of input feature map
19:      Factorization of input feature map gradients
20:      Residual
21:     Compute
Algorithm 1 Class Attribution Propagation with Guided Factorization.

Appendix C Ablation Study

By repeatedly employing normalization, our method is kept parameter-free. The different components of it, presented in Sec.4 of the paper, are visualized in Fig 3. One can observe several behaviors of the different component, (i) across all components, the semantic information is best visible in deeper layers of the network, where residual information is becoming more texture-oriented at shallower layers. (ii) The difference between and is mostly visible in the out-of-class regions: is derived from the data directly and is biased toward input-specific activations, resulting in highlights from the class that is not being visualized from layer 5 onward. (iii) The input-agnostic data term is more blurry than , as a result of using the tensor as input. It can, therefore, serve as a regularization term that is less susceptible to image edges.

Figure 3: Method decomposition on the image of Fig.1(a) of the paper. Visualization of the different elements of the method for a pre-trained VGG-19 network. Predicted top-class is “dog”, propagated class is “cat”. (1, 5, 11) - Convolution layer number.

Appendix D Additional Results

In Tab. 6 and 7 we show additional results for two different Grad-CAM variants, considering visualisation from shallower layers as denoted by Grad-CAM* = 4-layers shallower and Grad-CAM** = 8-layers shallower. An example of the different outputs can be seen in Fig. 3 of the previous section, where the semantic information of the Grad=CAM is less visible as we go shallower, where as in our method, the sematic information is becoming more fine grained. Additionally, we show the inferior results of Gradient SHAP and DeepLIFT SHAP for negative perturbation.

Gradient DeepLIFT Grad-CAM Grad-CAM* Grad-CAM** Ours
Predicted 7.5 7.8 37.8 29.9 16.0 38.9
Target 7.7 8.1 39.4 30.9 16.5 40.0
Table 6: Area Under the Curve (AUC) values for the two negative perturbation tests. Grad-CAM variants : * = 4-layers shallower, ** = 8-layer shallower.
Grad-CAM Grad-CAM* Grad-CAM** Ours
ImageNet Pixel Acc. 66.5 65.5 50.5 79.9
mAP 62.4 61.6 51.1 76.7
VOC’12 Pixel Acc. 52.6 18.4 17.2 77.5
mAP 41.7 44.5 43.4 45.9
Table 7: Quantitative segmentation results on (a) ImageNet and (b) PASCAL-VOC 2012.

Appendix E Self-Supervised Learning

Our submission is the first contribution, as far as we can ascertain, that employs explainability methods on Self Supervised Learning methods. We evaluate our method on three recent state-of-the-art models. First, we show our method performance on AlexNet model, trained by RotNet in a completely self-supervised fashion which depends only on data augmentations specially predicting the 2d image rotations that is applied to the image that it gets as input. In order to evaluate the quality of the features learned we follow (only for the RotNet experiments) a similar approach used in the literature by using a supervised linear post-training in order to visualize deeper layers.

Fig. 4 shows the visualization obtained for the image on the left when training a supervised linear classifier after each layer and applying our method. As can be seen, early layers learns edges-like patterns and the deeper layer learns more complicated patterns, more visualizations are shown in the end of the supplementary.

Figure 4: The visualization after each layer. (a) original image. (b) visualization after first layer. (c) visualization after second layer. (d) visualization after third layer. (e) visualization after the fourth layer. (e) visualization after last layer.

In order to evaluate SSL methods without any additional supervision, we employ the classifier the SSL is trained with. This is presented in Sec. 4.4 of the paper and is given below as pseudocode in Alg 2.

1: input image, set of images, the SSL network, where extracts the features and is the linear SSL classifier.
2: is the latent vector of image
3: is the set of all latent vectors for all images in
4: is the nearest neighbor of
5: Subtracting from to emphasize the unique elements of
6: Forward pass with the new latent vector
7: Choose the class with the highest probability
8:Apply Alg. 1, with the input tuple
Algorithm 2 self-supervised explainability by adopting nearest neighbors

Appendix F Additional Results - Multi Label

Figure 5: Visualization of two different classes for VGG-19.
Figure 6: Visualization of two different classes for VGG-19.

Appendix G Additional Results - Top Class

Figure 7: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 8: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 9: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 10: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 11: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 12: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 13: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 14: Visualization of the top predicted class of a VGG-19 ImageNet trained network.
Figure 15: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 16: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 17: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 18: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 19: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 20: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.
Figure 21: Visualization of the top predicted class of a ResNet-50 ImageNet trained network.

Appendix H Additional Results - AlexNet Probes

Input Layer 1. Layer 2. Layer 3. Layer 4. Layer 5.
Figure 22: The visualization after each layer using linear probes on the RotNet AlexNet model.

Appendix I Additional Results - Self-labeling (SeLa) Method

For each explainability method, we present results obtained by simply projecting the self labeling label with the highest probability (a very simplified version of our procedure that does not involve the nearest neighbor computation) as well as the one for Alg. 2.

Image with Alg.2 with Alg.2 with Alg.2 with Alg.2
Figure 23: The visualization of different explanability methods on Resnet-50 trained in self-supervised regime using self-labeling method.

Appendix J Additional Results - SCAN Method

For each explainability method, we present results obtained by simply projecting the self labeling label with the highest probability (a very simplified version of our procedure that does not involve the nearest neighbor computation) as well as the one for Alg. 2.

Image with Alg.2 with Alg.2 with Alg.2 with Alg.2
Figure 24: The visualization of different explanability methods on Resnet-50 trained in self-supervised regime using SCAN method.