Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks

04/01/2019 ∙ by Woo-Jeoung Nam, et al. ∙ Korea University 8

As Deep Neural Networks (DNNs) have demonstrated superhuman performance in many computer vision tasks, there is an increasing interest in revealing the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective that precisely separates the positive and negative attributions. By identifying the fundamental causes of activation and the proper inversion of relevance, RAP allows each neuron to be assigned an actual contribution to the output. Furthermore, we devise pragmatic methods to handle the effect of bias and batch normalization properly in the attributing procedures. Therefore, our method makes it possible to interpret various kinds of very deep neural network models with clear and attentive visualizations of positive and negative attributions. By utilizing the region perturbation method and comparing the distribution of attributions for a quantitative evaluation, we verify the correctness of our RAP whether the positive and negative attributions correctly account for each meaning. The positive and negative attributions propagated by RAP show the characteristics of vulnerability and robustness to the distortion of the corresponding pixels, respectively. We apply RAP to DNN models; VGG-16, ResNet-50 and Inception-V3, demonstrating its generation of more intuitive and improved interpretation compared to the existing attribution methods.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 7

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep Neural Networks (DNNs) play an important role in improving empirical performance of several computer vision applications, such as image classification [1, 2], object detection [3], human action recognition [4], and medical diagnosis [5, 6]. However, there is a transparency issue owing to the complex internal structure of DNNs, which is commonly referred to as a black box. It is difficult to intuitively interpret the result of DNNs, because the internal structures contain myriad of linear and nonlinear operations. The lack of interpretability prevents many DNN models from being applied in mission critical systems.

Fig. 1: Comparison of LRP and RAP applied in VGG-16. While the result of LRP- shows the positive and negative attributions distributed in close proximity, our RAP clearly distinguishes positive (red pixels) and negative relevance (blue pixels).

Recently, many studies have attempted to resolve the lack of transparency in DNNs. The attributing methods [7, 8, 9, 10, 11, 12] reveal the significant factors of the input in making decisions by assigning the relevance to each input of a DNN. To consider the positive and negative contributions of each neuron to the output of a DNN, [7] introduced the relevance propagation rule, which propagates the relevance by separating positive and negative attributions. However, the propagated positive and negative attributions on the visualized explanations are distributed side-by-side as shown LRP- in Figure 1. Thus, it is difficult to intuitively understand positive and negative attributions of LRP[7].

Propagating the positive and negative relevance simultaneously without considering the variation of contribution and inversion of relevance may lead to defective interpretation. It is necessary to clarify the actual contributions of individual units to the output, because the components of the complex inner structure, such as the activation function, weights, biases and batch normalization, terminate, shift and switch the conveyance of a value.

In this paper, we propose a new method to precisely separate the positive and negative relevances in analyzing the contributions of the input units to the output of a DNN. The main idea of this paper is to attribute the relevance in terms of the fundamental causes of activation and the relative contributions of individual units in each layer, so that the relevance assigned to each neuron is directionally in line with the actual contribution to the output. Figure 1 illustrates the outputs of the proposed method and those of LRP. The relevance of each neuron is propagated through the weight, along with considering the inversion of the contribution and the corresponding attribution, resulting in a pixel-by-pixel mapping of the positive and negative attributions. The main contributions of this work are as follows:

  • We propose relative attributing propagation (RAP), a method for attributing the positive and negative relevances to each neuron according to its actual contribution. We address the inversion of contribution and reduce the risk of assigning the positive and negative relevance according to the sign of weight.

  • We carefully analyze and solve the effect of bias and batch normalization in the relevance propagation procedure. Our approach makes it possible to attribute this influence to the contribution of each neuron, so that applicable to contemporary very deep networks including VGG-16 [13], ResNet-50 [2] and Inception-V3 [14]. Our method successfully handles bias and batch normalization in complex DNN models with intuitive and attentive visualizations.

  • We apply the region perturbation [15] and compare the distances between the attributions to assess whether the propagated attributions are well-distributed with having the correct meaning. The evaluation indicates that attributions from our RAP concentrated in the positive features of the object compared to the existing attribution methods with clear distinction of the negative features.

Ii Related Work

There are several studies on understanding of what a DNN model has learned. From the standpoint of interpreting a DNN model, the manner in which a DNN works can be visualized by maximizing the activation of hidden layers [16] or generating salient feature maps [17, 18, 19, 20, 21, 22]. [23] introduced the input switched affine network, which can decompose the contributions of previous characters to the current prediction, and [24] proposed the influence function to understand model behavior, debug models, detect dataset errors, and even create visually indistinguishable training-set attacks. [25]

proposed LIME, an algorithm that explains the predictions of classifier by learning an interpretable model locally around the prediction.

From the standpoint of explaining the decision of a DNN, the contributions of the input are propagated backward, resulting in a redistribution of relevance in the pixel space. Sensitivity analysis visualizes the sensitivities of input images classified by a DNN while explaining the factors that reduce/increase the evidence for the predicted results [26]. [21] proposed a deconvolution method to identify the patterns of a predicted input image from a DNN. Layerwise relevance propagation (LRP) [7]

was introduced to backpropagate a relevance, which makes the network output become fully redistributed throughout the layers of a DNN.

[15] showed that the LRP algorithm qualitatively and quantitatively provides a better explanation than do either the sensitivity-based approach or the deconvolution method.

Guided BackProp [27] and Integrated Gradients [28] each compute the single and average partial derivatives of the output to attribute the prediction of a DNN. Deep Taylor Decomposition [8] is an extension of LRP for interpreting the decision of a DNN by decomposing the activation of a neuron in terms of the contributions from its inputs. DeepLIFT [10] decomposes the output prediction by assigning the differences of contribution scores between the activation of each neuron to its reference activation. [29] approached the problem of the attribution value from a theoretical perspective and formally proved the conditions of equivalence and approximation between four attribution methods: Guided Input, Integrated Gradients, LRP and DeepLIFT.

However, there are no studies which analyze the problem of ambiguous visualization in dealing with negative relevance. We bring out the fundamental causes of this problem and address the solution to precisely handle the negative relevance.

Iii Background

In this section, we briefly introduce notations and an attribution method LRP, which is closely related to our method in terms of propagating relevance.

Iii-a Notations

In this paper, we use the letter

to denote the value of the network output before passing through the softmax layer for input

. The letter represents the value of corresponding to the prediction node, which is the input relevance for the attributing procedure. A neuron in the layer receives the value from a neuron in the layer , which is obtained by multiplying the value and the weight . The value is changed into after passing the addition of bias and the activation function .


The signs of positive and negative values are denoted by and .

Iii-B Layerwise Relevance Propagation

The principle of LRP is to find the parts with high relevance in the input by propagating the result from back (output) to front (input). The algorithm is based on the conservation principle, which maintains the relevance in each layer. It is assumed that is the relevance of a neuron in a layer and that is associated with a neuron of the layer .


[7] introduced two relevance propagation rules that satisfy Equation (2). The first rule called LRP- is defined as


This rule lets a neuron in the layer receive the relevance according to the their contribution to the activation of the neurons in the layer . The constant prevents the numerical instability for the case in which the denominator becomes zero. The second rule LRP- is used to enforce the conservation principle by separating the positive and negative activations during the relevance propagation process.


In this rule, and . Here, we note that rule (4) separates the positive and negative relevance according to the sign of . After the relevance propagation is finished, the propagated attributions are mapped to the pixels of the input image and visualized as a heatmap.

Iv Interpreting Contributions and Relevances

In this section, we introduce our inversion of relevance which precisely reflects the actual contribution of each neuron, and then explain our attributing procedure in detail.

Fig. 2: A simple example to illustrate the positive and negative contributions and their inversion. In the forward pass, we assume that all values of neurons are non-negative. In the backward pass, the actual relevance is decomposed according to the contribution of each neuron. We emphasize that the sign of weights and relevances are not same because of the inversion of contribution. While LRP regards and as the positive and negative relevances, RAP assigns and to the positive and negative ones, respectively.
Fig. 3: Overall structure of RAP algorithm. After the first propagation of relevance, RAP is separated into two flows, Normal and Inversed cases. Red and blue colors in weights and neurons denote positive and negative values, respectively. After both propagations are finished, final result is produced by adding result of each case.

Iv-a A Motivational Example

Figure 2 presents an example of the last two layers before the output layer. Each layer includes two neurons and fully connected weights. Here, we assume that the forward process does not include bias and batch normalization to simplify our illustrations. These assumptions are lifted in Section 4.2.3. We assume that all neurons in layer and are non-negative. In the forward pass, propagates the positive value through the positive weight to the , which contributes positively to the output . Therefore, should receive a positive relevance from in the backward propagation.

However, it is interesting to investigate the contribution of which is propagated through . conveys a negative value to and contributes negatively to the . Therefore, should receive the positive relevance from through . Thus, the signs of and are not same. With a similar reasoning, the contribution of to is inversed in the final propagation to and the relevance should be negative.

Here, we can discover the problem of relevance propagated through rule (4). In the above example, the criterion for separating the positive and negative relevance of rule (4) is the sign of the connected weight. Therefore, and are regarded as the positive and negative relevances and multiplied by and , respectively. This mismatch between the contribution and the relevance causes the offset of positive and negative relevance. To be more specific, in case of LRP-, and receive and , respectively, resulting in the offset of each actual relevance. This is the reason why the positive and negative attributions propagated through rule (4) are visualized redundantly in close proximity as illustrated in Figure 2.

In this method, we separate the positive and negative relevance based on the actual contribution of each neuron. Through the backward pass example in Figure 2, it is possible to deduce the actual contribution from the relevance value in the layer and the connected weights. There is no inversion of contribution in the layer , because it is directly connected to the output . When the neuron in the layer receives the positive relevance from , the positive and negative relevances which are propagated to the layer from this neuron should be labeled according to the sign of connected weights. However, when considering the opposite case, such as , we have to separate the relevance conversely. Our result in Figure 2 shows an improved and intuitive separation of the positive and negative parts in the input image compared to the result of existing method, LRP-.

Iv-B Relative Attributing Propagation

To consider the positive and negative contributions of individual units and additional components simultaneously, we propagate the relevance in a relative perspective for maintaining its directionality respect to zero, allowing each neuron to be received the relevance corresponding to its actual contribution. Our method divides the relevance propagation process into two flows, normal and inversion cases, to prevent the offset of the relevance and to consider the mutating effect of additional components. After the relevance is fully propagated in the both flows, the results are combined into the final relevance. Figure 3 shows the overall architecture of our method.

Iv-B1 The First Propagation of the Output Relevance

In the first step, we propagate the same value according to the sign of connected weight from a prediction node in the final layer to a neuron in the previous layer .


In the perspective of a contribution in the forward pass of the layer to , the weighted sum between the neurons and the weights directly affects the output . Therefore, the sign of the relevance and the actual contribution of each neuron in the layer are same. Because the activation function is not applied to the layer in a DNN, each sign of propagated relevance is same with the sign of . After rule (5) is applied, we separate the relevance into (positive) and (negative) according to the sign, respectively.


This Equation (6) is applied in each layer during the relevance propagation process. In the first propagation of the relevance, we do not consider the effect of bias. Because the relevance is propagated from the prediction node , the bias does not affect the proportion of the relevance in the layer .

Iv-B2 Inversion of Relevance

After the first propagation, RAP is separated into two sets, a normal set and an inversed set. We assume that a neuron in a layer is connected to a neuron in a layer through . The neuron

is non-negative when it passes through the ReLu activation function

[30]. However, we can also handle the situation that neurons have negative values. In the forward pass, and convey the positive and negative values to a neuron , respectively. The formal (latter) respectively, promotes (inhibits) the activation of the neuron . The relevance to a neuron corresponding to the former case is determined as the positive and negative relevance according to the sign of the relevance . The relevance propagation rule for a normal set is as follows.


Here, the relevance is separated into and by the sign of as shown in Equation (6). In this case, the criterion for distinguishing the positive and negative relevances is the sign of . Therefore, the first line computed with denotes positive relevance. The second line presents negative relevance. However, for the latter case, and become the criterion to distinguish the positive and negative relevances to reflect the actual contribution to .

Fig. 4: Visualization of the results generated from normal and inversed relevance propagation. The final result is the addition of the actual values of both cases. Propagating the relevance according to the actual contribution of each neuron shows a reasonable distribution of the relevance.

By Equation (6), and are divided into and , respectively. Because the sum of each set is maintained at zero, also becomes zero. The neurons with propagated relevance are distributed in positive and negative directions while keeping the average at zero.

In the forward pass, the normal and inversed contributions of a neuron are propagated separately through the connected weights to the next layer without interfering each other. Therefore, the relevance of normal and inversion should be considered individually. In this reason, we do not utilize both rules (7), (8) simultaneously. The offset between the normal and inversion leads to the increment/decrement of relevance value and prevents the neurons in the latter part of the procedure from being attributed their actual relevances. Propagating each rule, respectively, allows the neurons to be attributed with the extent that they contribute to without the distortion of relevance. Furthermore, after each attributing progress is finished, the final relevance is produced by adding the results of two flows. The Equation (6) is preserved, even if the two flows are added.


Figure 4 shows heatmaps generated from each process, normal (RAP-Normal) and inversion (RAP-Inversion), and the final output of RAP, which is the addition of both results. Red and blue colors in the heatmap denote the positive and negative attributions, respectively. We confirm that the positive and negative attributions are distributed moderately both in normal and inversion processes. Here, we note that the intensity of the represented color does not indicate the magnitude of the actual value. Since the heatmap represents the normalized version of the result, the intensity of the color depends on the distribution of the attributions. The final result is generated by adding the actual values of both cases before normalization.

Iv-B3 Handling Bias and Batch Normalization

Bias and batch normalization are important factors for many commonly used DNN models, such as VGG-net, ResNet and Inception net, allowing them to be successfully trained with the large datasets. Bias helps in learning weights by shifting the activation function. Batch normalization accelerates the learning speed and prevents gradient vanishing/exploding by stabilizing the training process itself as a whole instead of utilizing the bias. However, it is necessary to focus on their effects on the variation of a neuron value, not on the effects on the training procedure, to interpret a completely trained model. We successfully solve these effects through the view point of the inversion of contribution and relativity respect to zero.

After the training procedure is finished, the value of each bias is fixed to a scalar value. In the forward pass, many neurons are activated/inactivated by the addition of the biases in each layer. Suppose that inversion of the contribution has not occurred in the forward pass. Then, we can simply divide the positive and negative contribution of bias according to its sign. Furthermore, we can easily accept that a bias related with the activated neuron should be considered during relevance propagation.

However, we emphasize that a bias connected with the unactivated unit also has to be considered. A bias is added equally to the individual units in the feature map in a convolution layer. That is, a bias increases/decreases the importance of certain feature maps for determining the classification. Therefore, this effect should be subtracted equally from the units in the feature map before being propagated through the weights. Unfortunately, simply subtracting the values of the biases disturbs the relevance to be preserved and is not reasonable in terms of the scale.

To solve this issues, we shift the average of the biases back to the origin by subtracting the mean bias value and then divide them by the size of the corresponding feature map. As explained previously concerning the inversion of contribution, the effect of the bias is also mutated according to the actual contribution to . The rule for subtracting the effect of bias from a neuron in a layer is as follows.


Here, we denote the result as bias subtracted relevance. is the value of for the normal and inversion cases, respectively, which switches the effect depending on its actual contribution. The size of the feature maps in a layer is denoted as .

To consider the effect of batch normalization layer, we simply regard the effect as the mutation of scalar by subtracting the input from the output of a layer in the forward pass. Because the subtracted value during batch normalization is regarded as the variation applied to each unit in the layer , we compensate this effect from the input relevance with considering only non-zero neuron. Rule for Subtracting the effect of batch normalization layer is as follows.


Here, is the difference between output and input of batch normalization. is changed into by multiplying zero to the neurons which are connected with zero value in . We denote the result as batch normalization subtracted relevance. This process also switches contribution by using .

Fig. 5: Comparison of the results of conventional methods and our RAP in VGG-16 network.

V Experimental Evaluations

We extensively verified our RAP on large scale CNNs including VGG-16, ResNet-50 and Inception-V3 models, which have achieved the impressive performances in several classification tasks. We used the Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012) dataset [31]

, which is widely employed and easily accessible. We implement RAP with TensorFlow and Keras and generate the explanation visualized as a heatmap. The visualized heatmap is represented by seismic colors, where red and blue colors denote positive and negative, respectively.

The result of our method is compared with those of existing attribution methods, including integrated gradients, gradient* input, pattern attribution, LRP-, LRP-Preset [8], and LRP-Preset [32]. LRP-Preset methods are the extended versions of LRP-, where the preset configurations denote LRP- for dense layers and LRP- for convolution layers. The difference between A and B is that {} and {} in the respective cases. For fair comparisons, we follow the implementation for the conventional methods introduced in [33], which is available at repository:

Fig. 6: Comparison of results between LRP and RAP applied in ResNet-50 and Inception-V3.

V-a Qualitative Evaluation of Heatmap

For qualitatively evaluating the positive attributions generated by RAP, we compare the results by examining how the areas in which positive attributions converge are similar with those of the other methods. As the existing methods propagate the positive relevance well, we can utilize them to assess whether our method is consistent in attributing positive relevance. Figure 5 presents the heatmaps generated from the various methods for the predicted images by the VGG-16 network. Figure 6 illustrates the comparison between LRP- and RAP in ResNet and Inception net. The more qualitative comparisons in various networks are illustrated in Figure 9.

To qualitatively evaluate the negative attributions, we regard the attributions allocated in the parts that are not related to the prediction as the negative relevance. Given that LRP- considers the positive and negative relevance during relevance propagation, there is an intuitive visual difference. While our results clearly distinguish the positive and negative attributions, the attributions from other methods overlap each other and seems to be purple as shown in Figure 5 and 6. We qualitatively assessed 5,000 images in validation set of ILSVRC 2012 dataset and most of them showed satisfactory results in terms of human view.

Positive Removal (Shuffling) 0 250 500 1000 2000 4000 All
Prediction score
80.2 (84.2)
17.21 (17.85)
70 (78.8)
16.33 (17.19)
58.2 (74)
15.37 (16.60)
46.6 (70.2)
14.37 (15.99)
29.6 (65.4)
13.02 (15.39)
0 (0.2)
5.22 (7.67)
Prediction score
92.6 (94.8)
18.81 (18.68)
90.4 (93.2)
18.69 (18.42)
85.4 (90)
18.31 (17.99)
78.4 (85.4)
17.80 (17.27)
70.8 (77.6)
16.91 (16.31)
32.4 (38.4)
11.73 (11.78)
Prediction score
86.6 (90.6)
18.33 (18.15)
79.4 (86.2)
17.61 (17.29)
70.2 (76.4)
16.27 (15.96)
57.2 (65.6)
14.75 (14.30)
41.4 (44.2)
12.90 (12.29)
0 (5.23)
0 (7.65)
Prediction score
87 (85.6)
17.93 (17.56)
77.6 (77.4)
16.96 (16.30)
65.2 (63.2)
15.88 (14.77)
51 (42.2)
14.40 (12.85)
35.2 (22.2)
12.75 (11.12)
0 (0)
5.22 (7.65)
Prediction score
86.8 (85.2)
17.93 (17.40)
78.8 (78)
17.04 (16.17)
65.8 (62)
15.82 (14.51)
49.2 (42.2)
14.21 (12.39)
34.6 (21.8)
12.57 (10.85)
2 (0.2)
9.44 (8.02)
Prediction score
84.4 (89.4)
17.82 (17.71)
76.8 (83)
17.01 (16.83)
65.2 (74.4)
15.66 (15.28)
53.4 (57.4)
13.86 (13.38)
35.2 (38.4)
12.03 (11.42)
0 (0.2)
5.93 (8.03)
Prediction score
85.6 (89.2)
17.82 (17.59)
75.2 (82.4)
16.95 (16.61)
65 (71.8)
15.60 (15.10)
50.6 (54.8)
13.48 (13.04)
34.2 (34.4)
11.63 (11.02)
0.4 (0)
7.07 (8.13)
Prediction score
88.8 (88)
17.98 (17.79)
78.8 (80.4)
17.12 (16.81)
67.6 (68.6)
16.09 (15.58)
55.6 (57.4)
15.19 (14.19)
44.2 (38.6)
13.82 (12.68)
6 (5.4)
9.31 (10.3)
Negative Removal (Shuffling) 0 2000 4000 6000 8000 10000 All
Prediction score
32.2 (63.6)
12.60 (15.66)
20 (51.4)
11.62 (14.22)
14.4 (40.8)
10.93 (13.13)
8.4 (32.8)
10.43 (12.37)
5.4 (28.6)
10.09 (11.82)
0.2 (7.6)
9.14 (9.31)
Prediction score
50.6 (79.8)
14.25 (16.29)
36.2 (64.4)
12.69 (14.58)
29.8 (55.4)
11.99 (13.73)
26 (50.2)
11.47 (13.14)
22.2 (48.8)
11.18 (12.79)
20.2 (47)
11.05 (12.41)
Prediction score
86.8 (91.6)
17.74 (17.67)
82.6 (87.6)
17.55 (17.29)
81.2 (84)
17.32 (16.99)
78.8 (83.4)
17.12 (16.78)
74.4 (81.2)
16.93 (16.53)
54.6 (67.6)
13.76 (14.49)
TABLE I: The variation of accuracy after applying pixel removal/shuffling in accurate prediction in VGG-16 net for assessing the positive and negative attributions. The original prediction performance of test set is 100%. The number in parentheses shows the performance when shuffling each attributions.

V-B Quantitative Assessment of Attributions

It is not trivial to access the quantitative performance of several attribution methods designed for explaining DNN models. The common approach to quantitatively assessing the explanations methods utilizes the region perturbation process that progressively distorts the pixels from the heatmap. [15] formalized this method as Area over the perturbation curve (AOPC). The pixels of most relevant first (MoRF) and least relevant first (LeRF) are perturbed to evaluate whether positive and negative attributions have the correct meaning for the result, respectively.

In our experiment, the principle of distorting pixels according to the ranking provided by the attribution maps remains the same. To distinguish the errors between models and methods, we extract the test set from the dataset which has 500 images and the prediction performance is 100%. Furthermore, we provide the large amount of pixel distortion in Table 1 to see at a glance the variation of performance according to the attributions. We change the value of the pixels with two cases, 1) traditional evaluating method that replaces the pixel value into the minimum value of the input image and 2) randomly shuffling the values of positively/negatively attributed pixels, respectively. Since a pixel can contribute positively to the activation of a certain class label due to its correlation with the neighboring pixels, we provide an analysis of random shuffling to consider the effect of correlation.

V-B1 Interfering Positive Attributions

Removing a pixel with a high relevance has a relatively high impact on the decrement of accuracy. As the conventional methods focusing on positive relevance shows the verified and reasonable results, we compared our method to these methods to confirm that the positive relevance is precisely attributed by RAP. Table 1 presents the variation of the accuracy when high-ranking pixels were removed.

The amounts of removed pixels corresponding to positive attributions are shown in the top row of Table 1. The result indicates that the accuracy rapidly degrades in the positive attributions removal process in all methods. Although our RAP considers the positive and negative relevance simultaneously, there is no prominent difference of the result in Table 1 Positive Removal. However, other methods except Integrated* Gradient show more decrement of accuracy than RAP during pixel distorting process. To analyze the cause of this phenomenon, we observed the distribution of positive attributions. The detail analysis is described in 3) Distributions of Attributions subsection.

V-B2 Interfering Negative Attributions

When a DNN makes a correct prediction, removing negative attributions should not bring a large decrement of the accuracy and relevance value. A small decrement can be caused, because removing pixels can bring the distortion of the object shape in the input image. However, it is important to note that removing pixels corresponding to the negative attributions does not always bring an increment of the prediction performance, because the negative relevance of incorrect prediction does not denote the positive relevance of true label. Table 1 Negative Removal shows the variation of the accuracy when negative attributions are removed.

As the results in Table 1, while LRP- shows a rapid decrement of the accuracy and relevance value, RAP rarely affects the prediction result during the negative attributions removal process. Thus, we can confirm that RAP precisely distinguishes the positive and negative relevance without overlapping each other.

(a) Two example images that are misclassified after region perturbation except RAP and their masks of top 4,000 positive attributions (first row) and top 10,000 negative attributions (second row)
(b) The comparison of the average distances between top 1 and top positive attributions derived from explaining methods
Fig. 7: Distributions of attributed relevances in heatmap

V-B3 Distributions of Attributions

It is hard to judge the better methods for visualizing the positive attributions by simply comparing the decrement of prediction accuracy, because each method poses different assumption and is designed for slightly different objectives. To be more specific, the positive attributions could be evenly scattered in main objects or focused on important features depending on the attributing methods. The latter case leads to less distortion of the shape of objects in an input image during the region perturbation. Figure 7 (a) shows two example images and their masks of top 4,000 positive attributions (first row) and top 10,000 negative ones (second row). When we applied the region perturbation to these images, only the images distorted by RAP are correctly classified. As shown in the masks, positive attributions of RAP are concentrated in the unique features of the object while many positive attributions of LRP and Gradient* Input are distributed over less distinctive ones such as object body and background. Therefore, the preserved parts of object after region perturbation play a role for classifying the image as a correct label. Negative attributions of RAP are clearly superior to provide intuitive visual explanations and robustness in the region perturbation. To generalize the phenomenon and verify how positive attributions are concentrated on important features, we provide the degree of distributions. We measure the distance of the positive attributions from the MoRF in our test set. The metric for calculating the average distance between the positive attributions and MoRF is as follows.


Here, denotes an ordered set of positive relevance value in a heatmap. means the average value of the euclidean distance between MoRF and top attributions. Figure 7 (b) illustrates the comparison of the distance between positive attributions and MoRF from the explaining methods. As shown in the figure, the positive attributions of RAP are closely distributed from MoRF. Here, it is hard to define that RAP propagates the positive attributions better than other methods do. The experiment shows that RAP has own meaningful characteristics, 1) clear distinction of positive and negative attributions and 2) the concentration of positive attributions on the important features.

Fig. 8: Application to the medical diagnosis field. We apply RAP to a model trained for classifying lung nodules and it provides the intuitive and attentive explanations.
Fig. 9: The first figure shows the comparisons of the results of conventional methods and our RAP in VGG-16, Resnet-50 and Inception-V3 networks. The illustrations below demonstrate the additional results of RAP in VGG-16.

V-C An Application to Medical Diagnosis

Although the use of DNN is increasing in the medical field, there are few studies concurrent with the explanation. For an additional experiment, we applied RAP to a DNN model trained on LUng Nodule Analysis 2016 data (LUNA 2016) [34] to diagnose the lung nodules. This dataset is derived from the Lung Image Database Consortium image collection (LIDC-IDRI) [35] which is composed of diagnostic and lung cancer screening thoracic CT scans with annotated lesions. The LUNA 2016 dataset excluded scans with a slice thickness greater than 2.5 mm from LIDC-IDRI. In total, 888 CT scans are included and 754,975 candidates are provided.

From the given candidate information, we cropped these scans into size cubes for training. Nodule candidates were extracted by randomly moving from the center of the tumor to increase the diversity of the data. We applied data augmentation to increase the number of positive samples to reduce the problem of the highly imbalanced ratio of false positive to true positive (735,418:1,557) in the training dataset. The positive samples are expanded through flip in x,y,z-axis, randomly rotation, shear and multi-scale transformation. The dataset was divided into training, validation, and test sets in a ratio of 8:1:1. To increase reproducibility of the model, we applied 10-fold cross validation with provided dataset. We applied a 3D convolutional neural network for training on the LUNA 2016 dataset to capture more discriminative features and utilize the full range of context information of candidates. The layer configuration of the 3D network is the same as the layer order of VGG-16, utilizing the batch normalization layer instead of the bias.

Before interpreting the result of the model, performance evaluation is a necessary condition. To verify the reliability of model, we used 10-fold cross validation and evaluated the average performance using five assessment metrics: 1) ACC, 2) precision (PRE), 3) recall (REC), 4) specificity (SPEC), and 5) F1 score (F1). In each cross validation, the partition of training and testing data is randomly extracted from the dataset. The trained model demonstrates 95.1% accuracy, 89.1% precision, 94.4% recall, 95.48% SPEC and 91.74% F1 for the test dataset. Figure 8 shows the visualization of positive and negative attributions for the classification. We sliced the center of 3D input and output for the visualization. RAP presents more intuitive and attentive visualizations for interpreting a DNN in high-dimensional data without confusing positive and negative attributions.

Vi Conclusion

There have been two streams of methods in interpreting a deep neural network: (1) understanding the internal patterns the network forms to solve a classification problem and (2) understanding how the network relates the individual units to the classification output. Methods in the former case reconstruct the visual patterns according to the neural network. Methods in the latter case include sensitivity analysis, deconvolution and relevance based attribution methods such as layer-wise relevance propagation (LRP). In this paper, we propose RAP, a explaining method for attributing the contributions to the predictions of DNNs by assigning the relevance score according to the actual contributions, which are divided into normal and inversion cases. Furthermore, our method makes makes it possible to distinguish the positive and negative attributions by considering the inversion of contribution by negative value in the relevance propagation. We evaluate our methods in quantitative and qualitative ways to verify the attributions correctly account for the meaning. To assess the attributions, we apply region perturbation with removing and shuffling the corresponding pixels. We show that the negative attributions propagated by RAP are robust to region perturbation, while the positive attributions are severely affected by distortion, like other methods. Furthermore, by calculating the distances between the positive attributions and MoRF, we confirm that RAP has a strength for propagating the positive attributions more focusing on important features than other methods. We also apply RAP to the lung nodules diagnosis to visualize the reason of the decision of DNN. We provide a clear visualization of important factors by separating the positive and negative attributions from complex and noisy lung tumor data. As the use of Deep learning increases not only in the medical domain but also in the practical field, it would be in future work to develop an explainable artificial intelligence that can be utilized in various fields including computer vision field.


This work was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No.2017-0-01779, A machine learning and statistical inference framework for explainable artificial intelligence)


  • [1]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Proceedings of the Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2016, pp. 770–778.
  • [3] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proceedings of the Advances in Neural Information Processing Systems, 2013, pp. 2553–2561.
  • [4] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.
  • [5] J.-Z. Cheng, D. Ni, Y.-H. Chou, J. Qin, C.-M. Tiu, Y.-C. Chang, C.-S. Huang, D. Shen, and C.-M. Chen, “Computer-aided diagnosis with deep learning architecture: applications to breast lesions in us images and pulmonary nodules in ct scans,” Scientific reports, vol. 6, p. 24454, 2016.
  • [6] S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of alzheimer’s disease with deep learning,” in Proceedings of the IEEE International Symposium on Biomedical Imaging.    IEEE, 2014, pp. 1015–1018.
  • [7] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
  • [8] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Müller, “Explaining nonlinear classification decisions with deep taylor decomposition,” Pattern Recognition, vol. 65, pp. 211–222, 2017.
  • [9] P.-J. Kindermans, K. T. Schütt, M. Alber, K.-R. Müller, and S. Dähne, “Patternnet and patternlrp–improving the interpretability of neural networks,” arXiv preprint arXiv:1705.05598, 2017.
  • [10] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.    JMLR. org, 2017, pp. 3145–3153.
  • [11] G. Montavon, W. Samek, and K.-R. Müller, “Methods for interpreting and understanding deep neural networks,” Digital Signal Processing, vol. 73, pp. 1–15, 2018.
  • [12] S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, and K.-R. Müller, “Unmasking clever hans predictors and assessing what machines really learn,” Nature Communications, vol. 10, p. 1096, 2019. [Online]. Available:
  • [13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
  • [15] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K.-R. Müller, “Evaluating the visualization of what a deep neural network has learned,” IEEE transactions on neural networks and learning systems, vol. 28, no. 11, pp. 2660–2673, 2017.
  • [16] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, “Visualizing higher-layer features of a deep network,” University of Montreal, vol. 1341, no. 3, p. 1, 2009.
  • [17] P. Dabkowski and Y. Gal, “Real time image saliency for black box classifiers,” in Advances in Neural Information Processing Systems, 2017, pp. 6970–6979.
  • [18] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.
  • [19] A. Mahendran and A. Vedaldi, “Visualizing deep convolutional neural networks using natural pre-images,” International Journal of Computer Vision, vol. 120, no. 3, pp. 233–255, 2016.
  • [20]

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929.
  • [21] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision.    Springer, 2014, pp. 818–833.
  • [22] B. Zhou, D. Bau, A. Oliva, and A. Torralba, “Interpreting deep visual representations via network dissection,” IEEE transactions on pattern analysis and machine intelligence, 2018.
  • [23] J. N. Foerster, J. Gilmer, J. Sohl-Dickstein, J. Chorowski, and D. Sussillo, “Input switched affine networks: An rnn architecture designed for interpretability,” in Proceedings of the International Conference on Machine Learning, 2017, pp. 1136–1145.
  • [24] P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.    JMLR. org, 2017, pp. 1885–1894.
  • [25] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.    ACM, 2016, pp. 1135–1144.
  • [26] D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. Müller, “How to explain individual classification decisions,” Journal of Machine Learning Research, vol. 11, no. Jun, pp. 1803–1831, 2010.
  • [27] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” arXiv preprint arXiv:1412.6806, 2014.
  • [28] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.    JMLR. org, 2017, pp. 3319–3328.
  • [29] M. Ancona, E. Ceolini, C. Oztireli, and M. Gross, “Towards better understanding of gradient-based attribution methods for deep neural networks,” in Proceedings of the International Conference on Learning Representations, 2018.
  • [30]

    V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in

    Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
  • [31] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
  • [32] S. Lapuschkin, A. Binder, K.-R. Müller, and W. Samek, “Understanding and comparing deep neural networks for age and gender classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1629–1638.
  • [33] M. Alber, S. Lapuschkin, P. Seegerer, M. Hägele, K. T. Schütt, G. Montavon, W. Samek, K.-R. Müller, S. Dähne, and P.-J. Kindermans, “innvestigate neural networks!” arXiv preprint arXiv:1808.04260, 2018.
  • [34] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van den Bogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts et al., “Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge,” Medical image analysis, vol. 42, pp. 1–13, 2017.
  • [35] S. G. Armato, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman et al., “The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans,” Medical physics, vol. 38, no. 2, pp. 915–931, 2011.