Log In Sign Up

Gradient Hedging for Intensively Exploring Salient Interpretation beyond Neuron Activation

Hedging is a strategy for reducing the potential risks in various types of investments by adopting an opposite position in a related asset. Motivated by the equity technique, we introduce a method for decomposing output predictions into intensive salient attributions by hedging the evidence for a decision. We analyze the conventional approach applied to the evidence for a decision and discuss the paradox of the conservation rule. Subsequently, we define the viewpoint of evidence as a gap of positive and negative influence among the gradient-derived initial contribution maps and propagate the antagonistic elements to the evidence as suppressors, following the criterion of the degree of positive attribution defined by user preference. In addition, we reflect the severance or sparseness contribution of inactivated neurons, which are mostly irrelevant to a decision, resulting in increased robustness to interpretability. We conduct the following assessments in a verified experimental environment: pointing game, most relevant first region insertion, outside-inside relevance ratio, and mean average precision on the PASCAL VOC 2007, MS COCO 2014, and ImageNet datasets. The results demonstrate that our method outperforms existing attribution methods in distinctive, intensive, and intuitive visualization with robustness and applicability in general models.


page 1

page 2

page 3

page 5

page 8

page 10

page 11

page 12


Generating Attribution Maps with Disentangled Masked Backpropagation

Attribution map visualization has arisen as one of the most effective te...

SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance

The leap in performance in state-of-the-art computer vision methods is a...

Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations

Deep learning is increasingly used in decision-making tasks. However, un...

FAR: A General Framework for Attributional Robustness

Attribution maps have gained popularity as tools for explaining neural n...

IMACS: Image Model Attribution Comparison Summaries

Developing a suitable Deep Neural Network (DNN) often requires significa...

1 Introduction

With the advancement of deep neural networks (DNNs) in various fields of computer science, many studies have been conducted to resolve the transparency of network decisions. As the most common explaining method, assignment of attribution (also called relevance) aims to understand the decisions derived from the complex inner structure of a network by highlighting the most relevant factors in the input and characterizing them as a supporting basis. Such theoretical methods increase the confidence in a model decision and lead to intuitive understanding even in fields that require specialized domain knowledge, such as medicine and chemistry. In addition, they are already commonly utilized in other computer vision fields, e.g., weakly supervised segmentation and detection, as the method to obtain the initial seeds.

To identify the relevant parts in an input for network prediction, many studies based on a modified backpropagation algorithm  

[bach2015pixel, kindermans2017patternnet, montavon2017explaining, zhang2018top, nam2020relative, ancona2018towards, gur2021visualization, lee2021relevance]

have attempted to resolve the importance of neurons in layers by adopting approaches that propagate the output logits in a backward manner. With own approach from each method, it is possible to clarify the significant parts as the basis for the predictions by assigning the relevance to the activated neurons according to their contribution. Although these methods have some advantages of localization and notable visualization, a few suffer from class-agnostic, scattered attribution to irrelevant factors, and ambiguous criteria for interpretation that mostly rely on a human-view.

In recognizing these issues, we carefully rethink the evidence of our decision. A conservation rule is derived from a layer-wise relevance propagation [bach2015pixel], and it maintains the output values during backward propagation, as the evidence for a decision, leading to the main motivation of many attribution-based methods. However, most methods are based on normalization among the activated neurons in each layer; therefore, variations in the output values do not affect the attributions, unless the sign of the output changes. From the perspective of the evidence for a decision, we compare the propagation of relevance with an arbitrary strategy, which is a risk-handling method for the volatile stock market. Although this is a completely different field, we could utilize the concepts newly for the evidence and a propagation rule. As an intuitive example, this strategy is used to securely manage assets in the cryptocurrency market. In the transfer process of cryptocurrencies from wallets to exchanges, investors might be exposed to the risk from price fluctuations. To avoid this potential risk, an asset can be hedged by taking a short position in the future market with respect to the value of the goods being transferred, resulting in considering as opposed to spot.

Fig. 1: Comparison of conventional attribution methods. Our method aims to interpret most essential attributions with intensive visualizations while overcoming shortcomings of lack of details, class-discriminativeness, and clear objectness.

This concept could be applied to the assignment of attribution as a new approach for evidencing network outputs and interpreting the opposing relevance. More formally, is the output of a DNN with input and number of output neurons. The initial relevance is

, which has the same shape as the output tensor but only has a target (

) output logit, . Assume that is fully decomposed into the input features as follows: , and consider the opposite case, i.e., . Other attribution methods are based on finding features that contribute negatively to a decision, resulting in the same final attribution but with an opposite sign compared to the original case, . However, our approach aims to maintain the evidence for a decision as positive while preserving the direction of the contribution of each neuron, whereas to negatively alter other irrelevant parts by considering contradictory evidence.

In this paper, we propose a new perspective for the evidence to network decision, allowing exploration of the most salient input features in the scenario of restricted allocation. Fig. 1 illustrates the examples that summarize the characteristics of our method. The attributions in the input are fully decomposed from the network output with characteristics of i) being limited to a minimal scope, denoting the most correlated parts to the network output, ii) clear localization to the detailed parts of an object, and iii) class discriminativeness with intuitive visualization. The main contributions of this study are as follows:

  • We establish a novel approach for decomposing network predictions using the hedging concept, to preserve the top contributing properties in the scenario of variating evidence scope. We define the evidence to decision from the positively overwhelming attributions in the initial contribution maps derived from gradients and activation. Moreover, we properly allocate the relevance scores in line with preserving the salient properties while denoting the other features related to the contradictory evidence as windless negative scores.

  • We carefully address the unpredictable phenomenon of a DNN when assessing the attributions using region perturbation-based metrics. We depict that DNNs are more vulnerable to scattered noise or small attacks over a wide range than perturbations of a cohesive area. As an alternative, we propose a most relevant first (MoRF) region insertion, which measures whether an assigned attribution could be utilized as a decision basis for a DNN.

  • As additional evaluations, we analyze the results of the pointing game, sanity check with model sensitivity, and outside–inside ratio to assess the quality of the attributions. To confirm the efficacy of understanding the models, we examine the performance in two scenarios of model decision (either only accurate or all labels). Similar to our assumption, the results demonstrate that our established method has the characteristics of objectness, class specific, and intensive and intuitive descriptions of salient input features.

2 Related Studies

Recently, many studies have aimed at increasing the interpretability of DNNs. As a method to analyze a DNN model itself, salient features are visualized by maximizing the activated neurons [erhan2009visualizing] in intermediate layers or generating saliency maps[simonyan2013deep, zeiler2014visualizing, mahendran2016visualizing, zhou2016learning, dabkowski2017real, zhou2018interpreting].  [ribeiro2016should] proposed a model agnostic method called LIME, which explains black-box models by locally approximating the decision boundary as simpler linear models.

A perturbation-based approach examines decision variations with gradual distortions in the network input. The large effect caused on the network output by the input perturbations leads to pathological perturbations, which induce adversarial effects  [fong2017interpretable].  [zeiler2014visualizing, petsiuk2018rise, Petsiuk_2021_CVPR]estimated the variations in the network output by applying occlusions with specified patterns such as random masking.  [fong2019understanding] established the concept of an extremal perturbation to comprehend the network decision with theoretical masking.

From the viewpoint of assigning attributions [lim2021building, lee2021relevance, chockler2021explanations, singla2021understanding, jung2021towards],  [bach2015pixel] first provided the concept of relevance and conservation for the evidence to the decision using several types of layer-wise relevance propagation (LRP) rules.  [montavon2017explaining] proposed deep Taylor decomposition (DTD) based on Taylor expansion in intermediate layers with a theoretical foundation. DTD decomposes the network decision into contributions of the input features. DeepLIFT [shrikumar2017learning] focused on clarifying the differences in the contribution scores of neurons and their reference activation. [ancona2018towards] studied attributions from a theoretical standpoint, formally proving the conditions of equivalence of previous methods. [lundberg2017unified] introduced the Shapley value to unify and approximate existing explaining methods. By modeling with the probabilistic winner-take-all process,  [zhang2018top] proposed excitation backprop with top-down attention.  [nam2020relative] first presented an influence perspective to resolve the overlapping phenomenon of positive and negative contributions, resulting in clear separation of a target object and its irrelevant background. [lapuschkin-ncomm19] examined misleading correlations among the objects in a network input and highlighted the necessity of understanding a network decision to reveal the “clever Hans” phenomena. However, methods based on LRP have limitations originating from the class-agnostic issue of DNN models, which results in same visualizations even when starting from different classes.  [zhang2018top] explored the cause of this issue as “winner always wins,” by which phenomenon highly activated neurons have the majority of the relevance during backward propagation. As an alternative,  [zhang2018top, gu2018understanding] presented contrastive concepts by propagating the relevance from the target in contrast to all other classes.

Some attribution approaches utilize the gradients with respect to the target based on the chain rule. Gradient-weighted class activation map (Grad-CAM)


is the most well-known and commonly used method owing to its easy applicability and high performance in localizing primary objects. It generates class-discriminative activation maps by computing gradients with respect to target activation neurons in the feature extraction stage. Guided BackProp 

[springenberg2014striving] is based on gradient backpropagation considering only positive values and aims to clarify specific input features. Integrated gradients [sundararajan2017axiomatic] utilize average partial derivatives of a network output to resolve the gradient saturation problem. SmoothGrad [smilkov2017smoothgrad] is a method for visualizing mean gradients with addition of a random Gaussian noise to an image. FullGrad[srinivas2019full] computes the gradients of the input and bias of each layer and sums over the complete network to visualize an entire saliency map.

As an integrated method of relevance and gradients, AGF[gur2021visualization] introduces attribution-guided factorization to extract class-specific attributions derived from input features and gradients. RSP[nam2021interpreting] analyzes the characteristics of “winner always wins” in the assignment of attributions, and provides class-specific and intuitive visualizations by distinguishing hostile activations with respect to the target class.

3 Evidence for Decision

In this section, we introduce the mechanisms of the general propagation rules and address different views on the evidence for a decision.

3.1 Background

[bach2015pixel] first introduced LRP, which finds the highly contributing parts in an input, by decomposing output predictions in a backward manner. It is based on the conservation principle, which treats the output of a network as the evidence for a decision and maintains it in each layer during propagation. More formally, assuming that denotes the relevance of a neuron in a layer and that is the assigned relevance by the propagation rule in layer , the above-mentioned conservation takes the form,


The forward process between layers and to which attributions are propagated is denoted as , where and represent a neuron and its weight, respectively. Here, we define the generic relevance propagation rule [montavon2017methods] that operates in a backward manner with the normalization process based on the contributions of the neurons as follows:


Here, and represent the indices of a neuron in layers and , respectively, and Eq. 2 preserves the conservation rule in Eq. 1, which maintains the total relevance as constant during propagation. To consider the positive and negative contributions of neurons, [bach2015pixel] introduced a rule LRP- that enforces the conservation principle. To maintain the total relevance, parameters are chosen such that . Each part of the relevance with positive and negative weights, marked as and , respectively, is multiplied by () and related to positive (negative) activations as positive (negative) relevance.


The obtained attributions are mapped to the pixels of the input image and shown as a heatmap after relevance propagation is completed from the last to the start layer.

Fig. 2: Images in the First row represent the input, initial attribution , and neurons to be assigned the relevance, respectively. The second row shows the issue when propagating the relevance according to the actual contributed value. The third row shows our view of influence on propagation and the criterion of evidence that overwhelms the negative influence. Detailed explanation is given in Section 3.2.

3.2 Paradox of Conservation

The first issue is on the consistency of the assigned attributions according to the variations in the evidence. In previous studies, the output logit before passing through the final classifier layer, e.g., softmax, was regarded as the evidence and used as the input relevance. However, because the propagation rule is based on normalization, the increase or decrease in the output logit does not affect the positive or negative direction of an attribution itself. When the logit value is negative owing to an incorrect prediction by the DNN or adjustment by a user, only sign reversal occurs, and the strength of all final attributions is preserved. It is important to rationalize these effects to clearly interpret a DNN model.

We also present the paradox of the conservation rule in Eq.1 when simultaneously considering positive and negative attributions. Fig. 2 illustrates the motivational examples for the intuitive understanding. The input image containing a dog and a cat is correctly classified by the DNN. We present the initial attribution, , by utilizing the gradient and activated neurons with respect to the dog label and visualize it with a channel-wise summation. Because is normalized, the positive and negative attributions have the same total absolute values, i.e., . It is assumed that this relevance is propagated to the neurons in the next layer by the general propagation rule in Eq.2. The second row in the figure shows the partial results based on Eq.3 when and before subtraction. In general, because the value propagated with the negative weight in a forward pass is a negative contribution to a decision, it is easy to consider that the neurons corresponding to the dog should be assigned negative relevance. However, owing to the dependence of the neuron activation values, the results are insignificantly different from those based on positive weights. When they are subtracted from each other to follow Eq.3, the conservation rule is preserved with differences within the attributions having the same direction. If the two partitions are added, although the directions are maintained, degeneracy of the relevance is inevitable.

We first raised this problem in our previous study [nam2020relative] and approached it from the perspective of the neuron’s influence. This approach indicates that highly activated neuron plays an important role in the network decision in both positive and negative ways, leading to the necessity of having lion’s share among the relevance. Further refined here, we intend to propagate the input relevance separately, to maintain the positive and negative directions [nam2021interpreting]. The third row in the figure illustrates our view with respect to the evidence for the dog. The first and second images visualize the positive and negative attributions from each section of the initial attribution, respectively. Although the cat-related features barely contribute to the those of the target dog, they negatively influence the primary object through negative weights. We consider the gaps where positive attributions outweigh the negatives in the case of equally limited assignments as the evidence for the network decision. Users should be able to observe the variations in the final assigned attributions by increasing or decreasing this evidence as desired. In addition, we mainly present the following three factors as important points of interpretation: i) strong objectness, which refers to the capability of separating the main objects and the irrelevant background, ii) class-specific attributions, which provide localization of the target class, and iii) detailed description of salient input features for intuitive understanding.

Fig. 1 shows the results of prominent explaining methods based on a modified backpropagation algorithm with their purposes. Although Grad-CAM is well known for its remarkable localization performance with respect to a target, many losses are caused in the description of the details of pixel-level granularity by the skipping of the interpretation of the feature extraction stages. Other methods can fully decompose a prediction in a backward manner, including in the feature extraction stage. We exclude some methods such as integrated gradient with which it is difficult to intuitively appraise the quality of interpretation in a human-view, owing to the scattered and overlapped positive and negative attributions. Although Guided BackProp and LRP represent attributions at a pixel level, there is no visual difference in the attributions per class. Methods with a contrastive perspective, including Relevance-CAM, contrast the relevance for one class with those of all other classes and provide class-specific attributions. However, positive or negative relevance is disseminated in the background or in other sections unrelated to its origin. Because this approach yields a relatively larger activated area than other classes, irrelevant areas could receive positive or negative relevance scores.

In this study, we mainly explore the most salient attributions corresponding to a target. Although relative sectional propagation (RSP) and attribution-guided factorization (AGF) provide reasonable explanations of a decision with intuitive visualization, we aim to identify the intensive input features that largely influence a network, instead of targeting semantic segmentation.

4 Proposed Method

For backward decomposition of the network output, our method has two main stages: (i) acquiring initial contribution maps from the gradients of the target class and (ii) hedging the salient attributions with respect to the evidence while maintaining the importance in line with the influence.

Fig. 3: Difference between channel attributions of intermediate layers with/without considering activation properties. First row shows global input features of intermediate layers. Each row below visualizes channel-wise sum of attributions based on variants.

4.1 Initial Contribution Maps

The gradient activation maps of a network output could be obtained by several methods. In this section, we introduce a method to acquire them for a target class. It should be noted that there are another better technique to obtain the overall activation map than our method, such as Score CAM[wang2020score] and Grad-CAM++[chattopadhay2018grad]. However, there is no significant difference as the input initial attributions, and we utilize the simple, efficient, and robust way.

We notate sequentially from the intermediate layer: to the input layer: . Let and be the values of the network output and the target class node, respectively. Based on the chain rule, the gradients of the input features in intermediate layers and can be obtained as follows:


We backpropagate the gradient of until user-selected intermediate layer (input features prior to the averaging pooling layer in our study) using the chain rule. This gradient is utilized as a channel-wise neuron importance.


represents each neuron in the feature map with a matrix of size and indexed by pixel . The propagated gradients are global average-pooled to serve as neuron importance weights with respect to the activated neurons. denotes a normalization factor that divides the gradient activation map by the absolute value of the total sum, , for computational efficiency. Eq. 5 used for computing a gradient activation map is similar to Grad-CAM except elimination of the negative values and the last linear combination between the feature map and the partial linearization.

In our gradient activation map, positive and negative values exist based on the gradient with respect to the target class, whereas unrelated objects or the background are assigned close to zero or negative. Here, assuming that the absolute sums of the positive and negative attributions are equal, we define specific positive attributions that dominate the negative attributions as the evidence for a decision.

4.2 Propagation rules for Hedging Evidence

Recall that our concept is motivated by a hedging strategy. Assuming that the criterion of the evidence in the initial attributions implies a pure positive contribution, the negative influence of irrelevant objects or the background is contradictory evidence for the target. Before backward propagating the initial attributions of neuron in layer to neuron in layer , we modulate the sum of all positive and negative neuron values to be the same absolute value without changing the sign of each neuron. We denote the positive and negative sections of the initial attributions as and , and modify them as follows:


where represents a user preference parameter that decides the degree of the evidence to preserve during propagation. From this attribution, we compute the contributions of the neurons in layer to each section of the positive and negative attributions from an influencing perspective [nam2020relative]. More specifically, the influencing perspective is that activated neurons with higher values are more critical in both positive and negative manners, resulting in more dependence on their values. Therefore, for these neurons, the relevance should be allocated depending on the degree of influence, regardless of the positive or negative direction. We utilize both positive and negative weights by casting the absolute values. For the positive and negative sections, each contribution is computed as follows:

1:Input , Model: , Target: , Neurons in layer : ,   starts from end to start layer
2: Forward pass
3:Onehot One-hot encoding
4:for layer in Classification stages do
5:      Gradient propagation by chain rule
6:     if layer in Average Pooling then
7:         GAP
8:          Initial contribution map
9:     end if
10:end for
11:for layer in Feature Extraction stages do
12:     Let , weights,
17:      Update
18:end for
Algorithm 1 Propagation with Gradient Hedging
Fig. 4: Comparison of conventional and proposed attribution methods applied to VGG-16 on PASCAL VOC dataset. Class names on left side represent predicted labels of input image. Upper and bottom groups show attributions for predictions of single- and multi-label images. Red and blue colors represent positive and negative values, respectively.

Each positive and negative attribution is backpropagated to an activated neuron in previous layer based on the neuron contribution, maintaining the overall relevance sum as . We revisit two points that affect allocation of precise attribution: i) the gap between the focused features among the first and end layers and ii) the role of an inactivated neuron. It is well known that the extracted features from each layer are different: a deep layer implies an abstract feature. Fig.3 shows examples of the variations in the input features in intermediate layers. Because of the dependence of the attribution methods on the value of an activated neuron, even if a slightly accurate relevance is initially assigned in the end stage of the network, the relevance could be biased in the initial stage owing to the presence of activated neurons with high values in irrelevant areas such as edges and watermarks. Therefore, it is necessary to guide propagation to reflect the initial properties, including feature importance, which are directly related to network decisions. First, we define the mask of the neurons in each layer based on their activation as follows:


Subsequently, we design to propagate the positive attributions with respect to the target to be assigned to the activated neurons. The mask of the activated neurons receives positive relevance corresponding to the influence, regardless to the values of the neurons.


This propagation reduces the sensitivity of the attributions to the local texture, resulting in increased robustness to the salient attributions and preventing the misguidance to local features such as edges and watermarks.

As the third variant, we consider the role of the inactivated neurons. In terms of the contribution in a forward pass, the neurons most irrelevant to the output are inactivated neurons due to the disconnection of the value transfer. Therefore, we assign the negative attributions, which do not contributed to the decision, to the inactivated neurons by the following rule:


The all-ones tensor corresponding to an inactivated neuron receives negative relevance corresponding to the influence of the weight. In our previous study [nam2020relative, nam2021interpreting], uniform shifting presented excellent performance in separating the foreground and the background by changing the irrelevant attributions with relevance scores near zero into negative attributions. This allows the salient attributions to remain positive, with the irrelevant and hostile attributions becoming negative. We uniformly divide the absolute sum of the initial gradient activation maps, , by the number of activated neurons and subtract it from the summation of the variant equations as expressed below. For each iteration, the total uniform shifting value is consistent.


Relatively unimportant attributions, which are approximately zero, are converted into negative attributions during the propagation procedure. When progressing to the next layer, neurons positively contributed to these negative attributions are assigned the corresponding negative relevance scores, thereby the irrelevant attributions, e.g., the background, have negative relevance scores in the final output. The sum of the attributions, , is maintained as in each iteration, except for the convolutional layer, which modulates the number of channels. In this case, is set as and we do not utilize uniform shifting to prevent distortions along the channel map. In case of restricted relevance scores that assign contradictory evidence as negative, the propagated relevance explores the salient attributions by repeating the process in Eqs.613 in each layer. This procedure is repeated until the first layer, , of the model. For the final propagation to the input layer, we utilize rule[bach2015pixel], which is commonly used for the last propagation, resulting in clear visualization without interference of the attribution priorities. Algorithmic pseudo-code 1 describes the complete propagation rule.

Fig. 5: Applications of our method to various models: AlexNet, VGG-16, ResNet-50, Inception-V3, and DenseNet-121 on ImageNet validation dataset.

5 Experimental Evaluation

5.1 Implementation Details

We mainly utilize commonly adopted CNN architectures, VGG-16 and ResNet-50. We confirm that our method properly performs on the extended version of each network, i.e., VGG-19 and ResNet-121. Each model is trained on publicly accessible datasets Pascal VOC 2007 [everingham2010pascal] and MS COCO 2014 [lin2014microsoft]. For reasonable experiments, we adopt the trained models available online with the TorchRay package [fong2019understanding]. We also present the evaluation of additional CNN models—DenseNet-121[huang2017densely], Inception-V3[szegedy2016rethinking], and AlexNet[krizhevsky2012imagenet]—which are trained on the ImageNet 2012 dataset[ILSVRC15] and publicly accessible. The attributions obtained using each method are visualized by seismic colors, with red and blue colors denoting positive and negative values, respectively. We follow the official implementations of other explaining methods to prevent erroneous reports. All conditions in the experiments are the same except the setting of the target saliency layer is according to the original design of each method. Each type of evaluation is described in the subsequent sections.

PASCAL VOC 2007 COCO 2014 [b]
VGG-16 ResNet-50 VGG-16 ResNet-50 [b]
Grad-CAM L .866 .698 .740 .677 .903 .651 .823 .641 .542 .711 .490 .688 .573 .677 .523 .668 [t]
P .945 .772 .924 .733 .953 .736 .932 .715 .727 .741 .689 .711 .705 .723 .674 .702 [b]
Guided L .758 .530 .771 .594 .365 .288 .410 .340 [t]
BackProp P .880 .784 .857 .756 .600 .536 .573 .519 [b]
Excitation L .735 .520 .785 .623 .377 .304 .437 .374 [t]
BackProp P .856 .742 .864 .768 .573 .505 .582 .533 [b]
c*Exitation L .766 .474 .634 .468 .857 .404 .741 .411 .472 .466 .417 .453 .536 .387 .485 .397 [t]
BackProp P .856 .456 .784 .434 .945 .398 .887 .407 .659 .458 .620 .433 .671 .382 .636 .388 [b]
c*LRP L .719 .461 .521 .464 .852 .341 .779 .333 .434 .454 .388 .451 .533 .361 .479 .367 [t]
P .851 .451 .773 .418 .903 .323 .871 .311 .643 .447 .591 .441 .638 .358 .611 .354 [b]
Relevance L .640 .474 .510 .450 .746 .476 .569 .497 .395 .479 .328 .480 .434 .502 .369 .508 [t]
CAM P .849 .430 .765 .389 .834 .470 .690 .478 .616 .458 .570 .440 .574 .498 .521 .503 [b]
RSP L .849 .364 .712 .363 .859 .431 .749 .415 .540 .343 .479 .348 .558 .441 .504 .421 [t]
P .946 .348 .903 .324 .909 .437 .836 .410 .725 .337 .680 .331 .688 .438 .654 .431 [t]
AGF L .824 .279 .672 .274 .645 .364 .507 .369 .492 .286 .430 .283 .395 .371 .353 .370 [t]
P .925 .268 .882 .255 .717 .368 .679 .364 .703 .281 .661 .269 .553 .367 .543 .362 [t]
Gradient L .878 .125 .758 .118 .897 .191 .797 .178 .553 .124 .498 .122 .578 .187 .527 .183 [t]
Hedge P .958 .124 .923 .117 .961 .197 .937 .184 .747 .124 .707 .120 .708 .195 .674 .192 [t]
TABLE I: Performance of pointing game and mIOU on Pascal VOC 2007 test set and COCO 2014 validation set. Each method reports different cases of each step for testing: : Only predicted classes, : All labels. ALL and DIF represent full data and subset of difficult images, respectively. PR represents positive ratio of attributions with respect to entire size of input image. – denotes those attributions in pixels with all positive.

5.2 Qualitative Assessment

5.2.1 Illustrative Example

To qualitatively evaluate the attributions obtained from each approach, we examine the visual differences to compare how the pixels with high relevance scores are gathered in the target object. The consistency of the positive relevance among the approaches could be compared because all attribution methods have the same objective: to find the most relevant components. In the qualitative evaluation, which is highly dependent on human-view, we focus on comparing the following: i) class-specific attributions: whether an appropriate visualization could be described for a desired class, ii) detailed description of the neuron activations that are closely related with the human-view and intuitive understanding, and iii) intensive positive attributions and objectness, which play the central role in the rationale for a decision with few false positives in the irrelevant parts.

Fig. 4 shows comparison of Grad-CAM, guided backprop, contrastive excitation backprop (c*EB), contrastive LRP (c*LRP), attribution guided factorization (AGF), relevance CAM, and relative sectional propagation (RSP) results with the predictions of VGG-16 on the Pascal dataset. To clarify class-agnostic interpretability, we select the images containing two clear objects and propagate the attribution from each label. To verify the robustness and generalization, Fig. 5 illustrates the heatmaps of our method for the output predictions with various models—AlexNet, InceptionNet, and DenseNet—on the ImageNet validation dataset. All attributions in the visualization are derived from the propagation rule in Eq. 13 with . Compared with the attributions from other methods, those from our method are intensively distributed in the key part of the target objects (e.g., face and wheel) and there are clear separations between the target and other objects (including the background). These result in highly condensed visualization of the salient interpretation, in line with our assumption.

Fig. 6: Sanity check: assessing the sensitivity according to the gradual initialization of VGG-16 model weights from end to beginning layer. The first and second row represent the final attributions from Dog and Person label, respectively.

5.2.2 Sanity Check

Sanity check[adebayo2018sanity] deals with the insensitivity problem of some attribution methods when the parameters of the DNN are initialized in a cascading manner. It is critical to demonstrate that our explanation and the model decision are mutually reliant. Fig. 6 shows the variations in the attributions with the progressive randomization of certain layer weights. The attributions from each label are severely altered by the distortion of the parameters.

5.3 Quality Evaluations of Attributions

5.3.1 Pointing Game

Pointing game [zhang2018top] is a metric for evaluating attributions by determining the matching scores of the localization between the highest relevance point and the semantic annotations, e.g., the bounding box, of the object categories in an image. For each category, the localization accuracy is calculated as , where scores if the pixel with the highest relevance score is within the object. However, because attributions are inevitably affected by the predictive performance of the model, decomposition on labels not identified by the model could lead to misinterpretation. Therefore, we assess both cases: (i) P: from predictions and (ii) L: all labels, and the results are listed in Tab.I. In addition, we report the ratio of the pixels with positive relevance scores, notated as PR, to the total numbers of pixels in the image.

VGGNet 87.2 87.4 87.6 87.6 87.8
ResNet 88.5 88.6 88.8 89.1 89.7
Fig. 9: Variation in performance of pointing game with parameter . Evidence for decision can be increased or decreased by user preferences.

Methods with notation c* are those whose variation rules apply a contrastive perspective to set comparative classes to target for resolving the class-discriminative issue. We exclude the results of some methods, e.g., LRP, relative attributing propagation (RAP), and FullGrad, which have the disadvantage of being class-agnostic. As shown in the table, our method presents a substantially superior performance of finding the target object in both cases, despite occupying extremely low few positive spaces, to the other attribution methods.

Fig.9 shows the variations in the pointing game performance with parameter . Recalling, this parameter denotes the user preference that defines the allocation amount of positive evidence. As it parameter decreases, the allocation of attributions is expected to be concentrated in a tighter area, leading to enhancement in the localizing performance by the removal of contradictory evidence. As an ablation study, we examine different combinations of the propagation rule in Eq.13, and the results are summarized in Tab.II. The group of and contains our base concept, which maintains the evidence for a decision while converting elements corresponding to the opposite position, i.e., irrelevant objects and the background, to negative values. and increase the robustness to propagation by reducing the sensitivity to the local activated features in the shallow layers.

VGG ResNet
- - - 83.9 / 69.6 85.6 / 72.6
- - 87.2 / 73.2 88.5 / 76.9
- - 84.3 / 71.8 81.2 / 68.4
- 83.3 / 71.1 86.4 / 74.1
87.2 / 74.8 88.5 / 77.1
TABLE II: Ablation study on variants of propagation rule in Eq.13. Each row presents performance of pointing game (ALL/DIF) based on different components. To prevent divergence due to Eq.6, is set as 2.
Fig. 10: Intuitive examples for addressing issues of region perturbation metrics. Left side represents problem of perturbation-based metric due to vulnerability of DNNs to scattered noise attack. Alternatively, we evaluate increment in accuracy with the uprise of high-relevance pixels. Detailed explanation is described in Section 5.3.2.
Fig. 11: Variation in model accuracy with incremental MoRF insertion: from 1% to 20% total pixels in increments of 1%. First and second graphs compare attribution methods on VGG-16 and ResNet-50, respectively. Third graph compares our method and baseline (Grad-cam) on various models.

5.3.2 Incremental Insertion of Positive Attributions

Intuitively, when evaluating the assigned attributions, it is believed that removing high-relevance pixels should significantly reduce the prediction accuracy. [samek2017evaluating] introduced a method for quantitatively assessing explanations methods using the region perturbation process, which progressively distorts pixels from corresponding to the most relevant first (MoRF) and the least relevant first (LeRF), and they formalized this method as an area over the perturbation curve. However, in empirical experiments, we found that the prediction of a DNN is more distorted by scattered noise than by fine perturbation in the detailed region. This issue is related with the field of adversarial attack[goodfellow2014explaining], which addresses the vulnerability of a DNN to an adversarial perturbation. Fig. 10 shows samples for elucidating perturbation issues. An image that contains “guenon” is correctly classified with the VGG-16 model. In the figure, the first row shows the attributions from each method, and the second row represents the mask corresponding to top 4,000 pixels in the order of relevance value from each method. The third row shows the perturbated images in which the area corresponding to the mask is erased. RAP or Grad-hedge shows clear results with clustering in the center of the face, whereas the predicted classes present no changes. Conversely, LRP and random noise, particularly random noise that cannot be seen as an explanation from the human-view, change the prediction results into “Stingray” and “Shower cap”, respectively. In addition, many phenomena are contrary to the design of experiments in which misidentification by the DNN is due to a color change or the boundary of a distorted shape becomes an element of other features, e.g., perturbation of the territory of a “bee eater” causes it to be mistaken as a “grey owl”.

To address this phenomenon, we propose an evaluation framework called region insertion, which works contrary to the existing perturbation. We rebuild the Imagenet validation dataset by extracting the pixels corresponding to the MoRF based on a predetermined percentage and inserting them into a blank image to which a Gaussian filter is applied. The first and second graphs in Fig. 11 show the results of applying the MoRF insertion on VGG-16 and Resnet-50, respectively. The third graph presents the results of various models: VGG-16, ResNet-50, DenseNet-121, Inception-V3, and AlexNet with the baseline result of Grad-cam. For a reasonable comparison in third graph, the accuracy of each model is divided by the original model performance. For each step, we perturbate 1% pixels corresponding to the MoRF, a total of 20% pixels of each image distortion in increments of 1%. Based on the results, our method shows rapid increase in the accuracy compared with the existing methods. Thus, our method is the best in finding the most essential parts of the input as a condensed region with the limited positive relevance ratio.

Model Metric c*LRP c*EB RSP RCAM AGF Ours
VGG mAP 0.356 0.353 0.603 0.393 0.598 0.549
Out/In 0.979 0.983 0.670 0.965 0.678 0.456
ResNet mAP 0.516 0.428 0.608 0.505 0.622 0.559
Out/In 0.926 0.883 0.785 0.682 0.767 0.490
TABLE III: Performance of mAP and outside–inside ratio over ImageNet segmentation dataset. Attribution methods that have class-discriminativeness are compared. Low Out/In ratio implies attributions are intensively distributed in semantic mask.

5.3.3 Objectness of Attributions

In the field of weakly supervised segmentation (image-label level supervision), many studies[ahn2019weakly, Lee_2019_CVPR, huang2018weakly] have obtained the initial seeds by interpreting predictions, which include broad localization information. Although assigning attributions and segmentation can be closely associated in terms of seeking pixels corresponding to a predicted object, it is improbable that all areas of the object have equally significant effects on the decision. It is challenging to evaluate the distribution of attributions in salient parts of an object because there is no ground truth for the importance of annotations. However, it could be inferred to some extent from the density of the relevance score.

In this regard, [lapuschkin2016analyzing] introduced a metric called the outside–inside ratio, to evaluate the concentration of the relevance on the target object by comparing the relevance scores in/outside of the annotations, e.g., the bounding box and the segmentation mask. We use this method to determine the degree to which attributions are centrally distributed within the mask of an object. We mainly compare the conventional methods that deal with the class-agnostic issue and provide detail descriptions of the input features.


Here, denotes the cardinality operator, which measures the size of each set, inside () and outside () of the mask. represents normalization in the scope of . Because the compared methods have both positive and negative relevance scores and there can be cases in which there is no positive value inside the mask, all values can show divergence. To prevent this, we add normalization to the original metric, changing all relevance scores to positive while maintaining the degree of contribution.

We utilize the ImageNet segmentation dataset  [Guillaumin2014ImageNetAW], which consists of 4,276 images with segmentation masks. Tab.III compares different attribution methods—c*LRP, c*EB, RSP, RCAM, and AGF—based on the metric mean average precision (mAP) and the outside–inside ratio. Because positive relevance is distributed out of the segmentation mask, the value of is increased. By contrast, when the high-priority attributions are accumulated in in the mask, the ratio is decreased. Based on Tab. III, RSP and AGF show impressive segmentation performance without any additional supervision. In comparison, our method performs slightly lower based on the mAP, whereas it shows a superior performance than the other methods in terms of the outside–inside ratio metric, indicating that high-priority attributions are intensively distributed inside the segmentation mask, which are mainly significant input features related to the decision.

6 Conclusion

In this paper, we propose a novel method for intensively assigning attributions to salient input features with a new perspective on the evidence for a network decision. We carefully address the ambiguity in the criteria of the existing concepts of evidence and clearly define the evidence that should be preserved, while remaining antagonistic to the contradictory evidence. Therefore, relevance scores can be allocated to the core neuron activations associated with the evidence, resulting in intuitive, distinguishable, and attentive decomposition. We evaluate our proposed method quantitatively and qualitatively to confirm the quality of the attributions. The results demonstrate that the attributions from our method provide intensive salient input features, class-specific, and detailed descriptions of neuron activation.

To our best knowledge, there is still no perfect elucidation of the complex inner mechanisms of DNNs, excluding their structural and conceptual design. Therefore, it is crucial to provide the best explanations of a decision owing to the possibility of unexpected phenomena such as “clever Hans[lapuschkin-ncomm19]”. Methods that show a high degree of approximation in terms of objectness for a network, e.g., RSP, AGF, and the proposed method, offer many potential advantages in the field of weakly supervised segmentation and detection. Specifically, the application of DNNs in medical fields such as lung tumor and EEG analysis is expanding, and our method is expected to find the most salient evidence within wide or difficult-to-understand input data.

Future studies will extend the interpretation techniques to various computer science fields including natural language, few-shot learning, and speech analysis. The results are expected to reduce the potential risk of an unpredictable phenomenon and increase the reliability of the decision of a network.


This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2017-0-01779, A machine learning and statistical inference framework for explainable artificial intelligence(XAI), No. 2019-0-00079, Artificial Intelligence Graduate School Program(Korea University))