A Simple Saliency Method That Passes the Sanity Checks

05/27/2019 ∙ by Arushi Gupta, et al. ∙ Princeton University 4

There is great interest in *saliency methods* (also called *attribution methods*), which give "explanations" for a deep net's decision, by assigning a *score* to each feature/pixel in the input. Their design usually involves credit-assignment via the gradient of the output with respect to input. Recently Adebayo et al. [arXiv:1810.03292] questioned the validity of many of these methods since they do not pass simple *sanity checks* which test whether the scores shift/vanish when layers of the trained net are randomized, or when the net is retrained using random labels for inputs. We propose a simple fix to existing saliency methods that helps them pass sanity checks, which we call *competition for pixels*. This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map. The simplest variant of this is *Competitive Gradient Input (CGI)*: it is efficient, requires no additional training, and uses only the input and gradient. Some theoretical justification is provided for it (especially for ReLU networks) and its performance is empirically demonstrated.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Methods that allow a human to “understand”or “interpret” the decisions of deep nets have become increasingly important as deep learning moves into applications ranging from self-driving cars to analysis of scientific data. For simplicity our exposition will assume the deep net is solving an image classification task, though the discussion extends to other data types. In such a case the explanation consists of assigning saliency scores (also called attribution scores) to the pixels in the input, and presenting them as a heat map to the human.

Of course, the idea of “credit assignment” is already embedded in gradient-based learning, so a natural place to look for saliency scores is the gradient of the output with respect to the input pixels. Looking for high coordinates in the gradient is akin to classical sensitivity analysis but in practice does not yield high quality explanations. However, gradient-like notions are the basis of other more successful methods. Layer-wise Relevance Propagation (LRP) Bach et al. (2015) uses a back-propagation technique where every node in the deep net receives a share of the output which it distributes to nodes below it. This happens all the way to the input layer, whereby every pixel gets assigned a share of the output, which is its score. Another rule Deep-Lift Shrikumar et al. (2017)

does this in a different way and is related to Shapley values of cooperative game theory.

The core of many such ideas is a simple map called Gradient Input Shrikumar et al. (2017) : the score of a pixel in this rule is product of its value and the partial derivative of the output with respect to that pixel. Complicated methods often reduce to Gradient Input for simple ReLU nets with zero bias. See Montavon et al. (2018) for a survey.

Recently Adebayo et al. (2018) questioned the validity of many of these techniques by suggesting that they don’t pass simple “sanity checks.”  Their checks involve randomizing the model parameters or the data labels (see Section 2 for details). They find that maps produced using corrupted parameters and data are often difficult to visually distinguish from those produced using the original parameters and data. This ought to make the maps less useful to a human checker. The authors concluded that “…widely deployed saliency methods are independent of both the data the model was trained on, and the model parameters.”

The current paper focuses on multiclass classification and introduces a simple modification to existing methods: Competition for pixels. Section 3 motivates this by pointing out a significant issue with previous methods: they produce saliency maps for a chosen output (label) node using gradient information only for that node while ignoring the gradient information from the other (non-chosen) outputs. To incorporate information from non-chosen labels/nodes in the multiclass setting we rely on a property called completeness used in earlier methods, according to which the sum of pixel scores in a map is equal to the value of the chosen node (see Section 3). One can design saliency maps for all outputs and use completeness to assign a pixel score in each map. One can view the various scores assigned to a single pixel as its “votes” for different labels. The competition idea is roughly to zero out any pixel whose vote for the chosen label was lower than for another (non-chosen) label. Section 3.1 develops theory to explain why this modification helps pass sanity checks in the multi-class setting, and yet produces maps not too different from existing saliency maps. Section  3.2 gives the formal definition of the algorithm.

Section 4 describes implementation of this idea for two well-regarded methods, Gradient Input and LRP

and shows that they produce sensible saliency maps while also passing the sanity checks. We suspect our modification can make many other methods pass the sanity checks.

2 Past related work

We first recall the sanity checks proposed in Adebayo et al. (2018).

The model parameter randomization test. According to the authors, this "compares the output of a saliency method on a trained model with the output of the saliency method on a randomly initialized untrained network of the same architecture." The saliency method fails the test if the maps are similar for trained models and randomized models. The randomization can be done in stages, or layer by layer.

The data randomization test "compares a given saliency method applied to a model trained on a labeled data set with the method applied to the same model architecture but trained on a copy of the data set in which we randomly permuted all labels." Clearly the model in the second case has learnt no useful relationship between the data and the labels and does not generalize. The saliency method fails if the maps are similar in the two cases on test data.

2.1 Some saliency methods


denote the logit computed for the chosen output node of interest,


  1. The Gradient Input explanation: Gradient Input method Shrikumar et al. (2017) computes where is the elementwise product.

  2. Integrated Gradients Integrated gradients Sundararajan et al. (2017) also computes the gradient of the chosen class’s logit. However, instead of evaluating this gradient at one fixed data point, integrated gradients consider the path integral of this value as the input varies from a baseline, , to the actual input, along a straight line.

  3. Layerwise Relevance Propagation Bach et al. (2015)

    proposed an approach for propagating importance scores called Layerwise Relevance Propagation (LRP). LRP decomposes the output of the neural network into a sum of the relevances of coordinates of the input. Specifically, if a neural network computes a function

    they attempt to find relevance scores such that

  4. Taylor decomposition As stated Montavon et al. (2018) for special classes of piecewise linear functions that satisfy , including ReLU networks with no biases, one can always find a root point near the origin such that where the relevance scores simplify to

  5. DeepLIFT explanation The DeepLIFT explanation Shrikumar et al. (2017)

    calculates the importance of the input by comparing each neuron’s activation to some ’reference’ activation. Each neuron is assigned an attribution that represents the amount of difference from the baseline that that neuron is responsible for. Reference activations are determined by propagating some reference input,

    , through the neural network.

Relationships between different methods .Kindermans et al. (2016) and Shrikumar et al. (2017) showed that if modifications for numerical stability are not taken into account, the LRP rules are equivalent within a scaling factor to Gradient Input. Ancona et al. (2017) showed that for ReLU networks (with zero baseline and no biases) the -LRP and DeepLIFT (Rescale) explanation methods are equivalent to the Gradient Input.

3 Adding competition

Figure 1: Heatmap of Gradient Input saliency maps produced by various logits of a deep net trained on MNIST. Red denotes pixels with positive values and Blue denotes negative values. The input image is of the number , which is clearly visible in all maps. Note how maps computed using logits/labels " and " " assign red color (resp., blue color) to pixels that would have been expected to be present (resp., absent) in those digits. The last figure shows the map produced using our CGI method.

The idea of competition suggests itself naturally when one examines saliency maps produced using all possible labels/logits in a multiclass problem, rather than just the chosen label. Figure 1 shows some GradientInput maps produced using AlexNet trained on MNIST LeCun (1998), where the first layer was modified to accept one color channel instead of 3. Notice: Many pixels found irrelevant by humans receive heat (i.e. positive value) in all the maps, and many relevant pixels receive heat in more than one map.

Our experiments showed similar phenomenon on more complicated datasets such as ImageNet. This figure highlights an important point of

Adebayo et al. (2018) which is that many saliency maps pick up a lot of information about the input itself —e.g., presence of sharp edges–that are at best incidental to the final classification. Furthermore, these incidental features can survive during the various randomization checks, leading to failure in the sanity check. Thus it is a natural idea to create a saliency map by combining information from all labels, in the process filtering out or downgrading the importance of incidental features.

Suppose the input is and the net is solving a -way classification. We assume a standard softmax output layer whose inputs are logits, one per label. Let be an input, be its label and denote the corresponding logit. To explain the output of the net many methods assign a score to each pixel by using the gradient of with respect to . For concreteness, we use GradientInput method, which assigns score to pixel where os the coordinate in the gradient corresponding to the th pixel .

Usually prior methods do not examine the logits corresponding to non-chosen labels as well, but as mentioned, we wish to ultimately design a simple competition among labels for pixels. A priori it can be unclear how to compare scores across labels, since this could end up being an “apples vs oranges” comparison due to potentially different scaling. However, prior work Sundararajan et al. (2017) has identified a property called completeness: this requires that the sum of the pixel scores is exactly the logit value. Gradient Input is an attractive method because it satisfies completeness exactly for ReLU nets with zero bias. Recall that the ReLU function with bias is

Lemma 1

On ReLU nets with zero bias Gradient Input satisfies completeness.


If function is computed by a ReLU net with zero bias at each node, then it satisfies . Now partial differentiation with respect to at gives . ∎

Past work shows how to design methods that satisfy completeness for ReLU with nonzero bias by computing integrals, which is more expensive (see Ancona et al. (2017), which also explores interrelationships among methods). However, we find empirically this is not necessary because of the following phenomenon.

Approximate completeness. For ReLU nets with nonzero bias, Gradient Input in practice have the property that the sum of pixel scores varies fairly linearly with the logit value (though theory for this is lacking). See Figure 1 which plots this for VGG-19 trained on Imagenet. Thus up to a scaling factor, we can assume Gradient Input approximately satisfies completeness.

Enter competition.

Completeness (whether exact or approximate) allows us to consider the score of a pixel in Gradient Input as a “vote” for a label. Now consider the case where is the label predicted by the net for input . Suppose pixel has a positive score for label and an even more positive score for label . This pixel contributes positively to both logit values. But remember that since label was not predicted by the net as the label, the logit is less than than logit , so the contribution of pixel ’s “vote” to is proportionately even higher than its contribution to . This perhaps should make us realize that this pixel may be less relevant or even irrelevant to label since it is effectively siding with label (recall Figure 1). We conclude that looking at Gradient Input

maps for non-chosen labels should allow us to fine-tune our estimate of the relevance of a pixel to the chosen label.

Figure 2: Approximate completeness property of Gradient Input on ReLU nets with nonzero bias (VGG -19). An approximately linear relationship holds between logit values and the sum of the pixel scores for Gradient Input for a randomly selected image.

Now we formalize the competition idea. Note that positive and negative pixel scores should be interpreted differently; the former should be viewed as supporting the chosen label, and the latter as opposing that label.

Competitive Gradient Input (CGI): Label “wins”a pixel if either (a) its map assigns that pixel as positive score higher than the scores assigned by every other label, or (b) its map assigns the pixel a negative score lower than the scores assigned by every other label. The final saliency map consists of scores assigned by the chosen label to each pixel it won, with the map containing a score for any pixel it did not win.

Using the same reasoning as above, one can add competition to any other saliency map that satisfies completeness. Below we also present experiments on adding competition to LRP. In Sections 4 and 4.2 we present experiments showing that adding competition makes these saliency methods pass sanity checks.

3.1 Why competition works: some theory

Figure 1 suggests that it is a good idea to zero out some pixels in existing saliency maps. Here we develop a more principled understanding of why adding competition (a) is aggressive enough to zero out enough pixels to help pass sanity checks on randomized nets and (b) not too aggressive so as to retain a reasonable saliency map for properly trained nets.

Adebayo et al. (2018) used linear models to explain why methods like GradientInput

fail their randomization tests. These tests turn the gradient into a random vector, and if

are random vectors, then and are visually quite similar when is an image. (See Figure 10 in their appendix.) Thus the saliency map retains a strong sense of after the randomization test, even though the gradient is essentially random. Now it is immediately clear that with

-way competition among the labels, the saliency map would be expected to become almost blank in the randomization tests since each label is equally likely to give the highest score to a pixel so it becomes zero with probability

. Thus we would expect that adding competition enables the map to pass the sanity checks. In our experiments later we see that the final map is indeed very sparse.

But one cannot use this naive model to understand why CGI does not also destroy the saliency map for properly trained nets. The reason being that the gradient is not random and depends on the input. In particular if is the gradient of a logit with respect to input then is simply the sum of the coordinates of , which due to completeness property has to track the logit value. In other words, gradient and input are correlated, at least when logit is sufficiently nonzero. Furthermore, the amount of this correlation is given by the logit value. In practice we find that if the deep net is trained to high accuracy on a dataset, the logit corresponding to the chosen label is significantly higher than the other logits, say 2X or 4X. This higher correlation plays a role in why competition ends up preserving much of the information.

We find the following model of the situation simplistic but illustrative: Assume gradient and input are random vectors drawn from conditional on (i.e., correlated random vectors), where corresponds to the logit value. On real data we find that is, to for the chosen label, which is a fairly significant since the inner product of two independent draws from would be only in magnitude, say when .

Let be the gradient of a second (non-chosen) logit with respect to . Figure 1 suggests that actually and can have significant overlap in terms of their high coordinates, which we referred to earlier as shared features or incidental features (see Figure 1). We want competition to give us a final saliency map that downplays pixels in this overlap, though not completely eliminate them.

Without loss of generality let the first coordinates correspond to the shared features. So we can think of and where respectively are the sub-vectors of respectively in the shared features and are random -dimensional vectors in the second halves. All these vectors are assumed to be unit vectors. It is unreasonable to expect the coordinates of and to be completely identical, but we assume there is significant correlation, so assume .

Now imagine picking the input as mentioned above: Given it is a random vector conditional on . Then a simple calculation via measure concentration shows that half of the this inner product of must come from the first coordinates, meaning . Another application of measure concentration shows that , reflecting the fact that .

What happens after we apply competition (i.e., CGI)?

An exact calculation requires a multidimensional integral using the Gaussian distribution. But simulations (see Figures  

11, 12 in appendix) show that after zeroing out coordinates in due to competition from , we have a contribution of at least left from the first coordinates and a contribution of at least from the last coordinates, where are some constants. In other words, there remains a significant contribution from both the shared features, and the non-shared features. Thus the competition still allows the saliency map to retain some kind of approximate completeness.

Remark 1: There is something else missing in the above account which in practice ensures that competition is not too aggressive in zeroing out pixels in normal use: entries in the gradients are non-uniform, so the subset of coordinates with high values is somewhat sparse. Thus for each label/logit, the bulk of its score is carried by a subset of pixels. If each label concentrates its scores on

fraction of pixels then heuristically one would expect two labels to compete only on

fraction of pixels. For example if then they would compete only on or % of the pixels. This effect can also be easily incorporated in the above explanation. Remark 2: The above analysis suggests that saliency map can make sense for any label with a sufficiently large logit (eg the logit for label "7" in Figure 1.)

3.2 Formal Description of CGI

Here we provide a formal definition of our algorithm, CGI, which can be found in Algorithm 1,

Let denote the logit computed by the output node of our neural network. For each output node, , and for each scalar coordinate of the input, we compute , i.e. we compute Gradient Input for each scalar element of x for each of the C output nodes. Letting y denote the index of the chosen label, if , will be included in the heat map if is equal to the maximum of , and its value in the heat map will be . If , it will be included in the heat map if is equal to the minimum of , and its value in the heat map will be . For all other inputs, the default value in the heat map is 0.

Input: An image and a neural network
1 initialization: set H = vector. Let y be the index of the chosen output node;
2 for Element in Image  do
3       Calculate for all output nodes
4       if   then
5             if   then
6                   Make the corresponding element of equal to Element
7             end if
9      else
10             if   then
11                   Make the corresponding element of equal to Element
12             end if
14       end if
16 end for
Algorithm 1 Competitive Gradient Input

4 Experiments

Figure 3: Comparison of CGI saliency maps with Gradient Input saliency maps. Original images are shown on the left.

Figure 3 presents an example of CGI maps on the VGG-19 architecture on Imagenet. We find that our maps are of comparable quality to Gradient Input.

4.1 Parameter Randomization test

The goal of these experiments is to determine whether CGI is sensitive to model parameters. We run the parameter randomizaion tests on the VGG-19 architecture Simonyan and Zisserman (2014) with pretrained weights on ImageNet Russakovsky et al. (2015) using layerwise and cascading randomization.

4.1.1 Layerwise Randomization

In these experiments, we consider what happens when certain layers of the model are randomized. This represents an intermediate point between the model having learned nothing, and the model being fully trained.

Figure 4 shows the results of randomizing individual layers of the VGG-19 architecture with pretrained weights. (Figure 8 in the Appendix shows the full figure ).The text underneath each image represents which layer of the model was randomized, with the leftmost label of ’original’ representing the original saliency map of the fully trained model. The top panel shows the saliency maps produced by CGI , and the bottom panel the maps produces by Gradient Input. We find that the Gradient Input method displays the bird no matter which layer is randomized, and that our method immediately stops revealing the structure of the bird in the saliency maps as soon as any layer is randomized. Figure 10 in the Appendix shows a similar result but utilizing absolute value visualization. Notice that CGI’s sensitivity to model parameters still holds.

Figure 4: Saliency map for layer wise randomization on VGG -19 on Imagenet for Gradient Input versus CGI. We find that in CGI, the saliency map is almost blank when any layer is reinitialized. By contrast, we find that the original Gradient Input method displays the structure of the bird, no matter which layer is randomized.
Figure 5: Saliency map for cascading randomization on VGG -19 on Imagenet for Gradient

Input versus CGI. We find that in CGI, the saliency map is almost blank even when only the softmax layer has been reinitialized. By contrast, we find that the original Gradient

Input method displays the structure of the bird, even after multiple blocks of randomization.
Figure 6: Saliency map cascading randomization on VGG -16 on Imagenet LRP versus CLRP. We notice that LRP shows the structure of the bird even after multiple blocks of randomization. CLRP eliminates much of the structure of the bird.

4.1.2 Cascading Randomization

In these experiments we consider we what happens to the saliency maps when we randomize the network weights in a cascading fashion. We randomize the weights of the VGG 19 model starting from the top layer, successively, all the way to the bottom layer.

Figure 5 shows our results. The rightmost figure represents the original saliency map when all layer weights and biases are set to their fully trained values. The leftmost saliency map represents the map produced when only the softmax layer has been randomized. The image to the right of that when everything up to and including conv5_4 has been randomized, and so on. Again we find that CGI is much more sensitive to parameter randomization than Gradient Input.

4.1.3 Comparison with LRP

We also apply our competitive selection of pixels to LRP scores, computed using the Innvestigate library Alber et al. (2018) on the VGG-16 architecture with pretrained weights on Imagenet. The algorithm is the analogue of Algorithm 1, but we provide the full algorithm as Algorithm 2 in the Appendix for clarity. Figure 6 shows our results. We find that our competitive selection process (CLRP) benefits the LRP maps as well. The LRP maps show the structure of the bird even after multiple blocks of randomization, while our maps greatly reduce the prevalence of the bird structure in the images.

4.2 Data Randomization Test

We run experiments to determine whether our saliency method is sensitive to model training. We use a version of Alexnet Krizhevsky et al. (2012) adjusted to accept one color channel instead of three and train on MNIST. We randomly permute the lables in the training data set and train the model to greater than 98 % accuracy and examine the saliency maps. Figure 7 shows our results. On the left hand side is the original image. In the middle is the map produced by Gradient Input . We find that the input structure, the number 3, still shows through with the Gradient Input method. On the other hand, CGI removes the underlying structure of the number.

Figure 7: Second sanity check for Alexnet MNIST. On the middleimage we find that using the original gradient times input method results in an image where the original structure of the number 3 is still visible. On the right hand side image we find that our modification removes the structure of the original input image, as we would expect for a model that had been fitted on randomized data.

5 Conclusion

We have introduced the idea of competition among labels as a simple modification to existing saliency methods. Unlike most past methods, this produces saliency maps by looking at the gradient of all label logits, instead of just the chosen label. Our modification keeps existing methods relevant for human evaluation (as shown on two well-known methods Gradient Input and LRP) while allowing them to pass sanity checks of Adebayo et al. (2018), which had called into question the validity of saliency methods. Possibly our modification even improves the quality of the map, by zero-ing out irrelevant features. We gave some theory in Section 3.1 to justify the competition idea for Gradient Input maps for ReLU nets.

While competition seems a good way to combine information from all logits, we leave open the question of what is the optimum way to design saliency maps by combining information from all logits111One idea that initially looked promising —looking at gradients of outputs of the softmax layer instead of the logits—did not yield good methods in our experiments..

The recently-proposed sanity checks randomize the net in a significant way, either by randomizing a layer or training on corrupted data. We think it is an interesting research problem to devise less disruptive sanity checks which are more subtle.


6 Appendix

Let be the LRP score of Element. when decomposing output node .

Let be the index of the chosen output node.

Input: An image and a neural network
1 initialization: set H = vector;
2 for Element in Image  do
3        Calculate LRP[i,Element] for all output nodes
4        if LRP[y, Element] > 0 then
5               if   then
6                      Make the corresponding element of the LRP[y, Element]
7               end if
9       else
10               if   then
11                      Make the corresponding element of the LRP score of Element
12               end if
14        end if
16 end for
Algorithm 2 Competitive Layerwise Relevance Propagation
Figure 8: Saliency map for layer-wise randomization of the learned weights. Diverging visualization where we plot the positive importances in red and the negative importances in blue. We find that with CGI, the saliency map is almost blank when any layer is reinitialized. By contrast, we find that Gradient Input displays the structure of the bird, no matter which layer is randomized.
Figure 9: Saliency map cascading randomization LRP versus CLRP.
Figure 10: Saliency map for layer-wise randomization of the learned weights. Absolute value visualization where we plot the absolute value of the saliency map. We find that using CGI, the saliency map is almost blank when any layer is reinitialized. By contrast, we find that Gradient Input displays the structure of the bird, no matter which layer is randomized.
Figure 11: versus for 100 averaged samples
Figure 12: versus for 100 averaged samples