Robust saliency maps with decoy-enhanced saliency score

02/03/2020 ∙ by Yang Lu, et al. ∙ 7

Saliency methods help to make deep neural network predictions more interpretable by identifying particular features, such as pixels in an image, that contribute most strongly to the network's prediction. Unfortunately, recent evidence suggests that many saliency methods perform poorly when gradients are saturated or in the presence of strong inter-feature dependence or noise injected by an adversarial attack. In this work, we propose to infer robust saliency scores by integrating the saliency scores of a set of decoys with a novel decoy-enhanced saliency score, in which the decoys are generated by either solving an optimization problem or blurring the original input. We theoretically analyze that our method compensates for gradient saturation and considers joint activation patterns of pixels. We also apply our method to three different CNNs—VGGNet, AlexNet, and ResNet trained on ImageNet data set. The empirical results show both qualitatively and quantitatively that our method outperforms raw scores produced by three existing saliency methods, even in the presence of adversarial attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks (DNNs) deliver remarkable performance in an increasingly wide range of application domains, but they often do so in an inscrutable fashion, delivering predictions without accompanying explanations. In a practical setting such as automated analysis of pathology images, if a patient sample is classified as malignant, then the physician will want to know which parts of the image contribute to this diagnosis. Thus, in general, a DNN that delivers interpretations alongside its predictions will enhance the credibility and utility of its predictions for end users 

(lipton2016mythos).

To address this lack of transparency issue, existing research has proposed a variety of explanation methods, which can be categorized into counterfactual-based methods and saliency methods (ancona2017towards). In this paper, we focus on saliency methods, which aim to find input features that strongly influence the network predictions (simonyan2013deep; selvaraju2016grad; binder2016layer; shrikumar2017learning; smilkov2017smoothgrad; sundararajan2017axiomatic; ancona2017towards; levine2019certifiably). Vanilla saliency methods (simonyan2013deep; selvaraju2016grad; binder2016layer) directly use the partial derivative of class score to assign importance scores to features under the assumption that important features in general correspond to large gradients. However, this assumption are often violate in the presence of the gradient saturation (sundararajan2016gradients; shrikumar2017learning; smilkov2017smoothgrad) problem: the gradients of important features may have small magnitudes. This issue can be triggered when the outputs of a neural network flattened in the vicinity of important features. As a result, saliency methods fail to highlight those important features that have tiny gradients. More advanced saliency methods alleviate the gradient saturation by perturbing the input repetitively and integrating the saliency maps of those perturbed input (shrikumar2017learning; sundararajan2017axiomatic; smilkov2017smoothgrad).

Despite to some extent mitigating the gradient saturation problem and yielding interpretations that are meaningful to humans, these advanced saliency methods still exhibit several limitations. First, these methods are built upon the assumption that sightly perturbing the input, such as by adding noise (smilkov2017smoothgrad), rescaling (sundararajan2017axiomatic), does not affect the network output (sturmfels2020visualizing). However, recent research (burns2019interpreting; carlini2017towards) shows that even a tiny perturbation may lead to an unexpected change to the network prediction, giving rise to unpredictable behavior of the resulting explanations (burns2019interpreting). Second, pixel-wise saliency methods evaluate the importance of each feature in an isolated fashion, implicitly assuming that the other features are fixed (singla2019understanding). Without taking inter-dependencies among features into consideration, the insight gained from individual features may be problematic. Finally, Ghorbani et al. (ghorbani2017interpretation) and Kindermans et al. (kindermans2017reliability) systematically revealed the fragility of widely-used saliency methods by showing that even an imperceivable random perturbation or simple shift transformation of the input data can lead to a large change in the resulting saliency scores. In light of this, for any saliency method, a key challenge is ensuring that the saliency scores are both immune to gradient saturation (sundararajan2016gradients) without triggering ill-defined network behaviors, reflecting the joint effects from other dependent features (singla2019understanding), and robust to adversarial perturbations (ghorbani2017interpretation).

In this paper, we aim to tackling all these challenges. At a high level, our method follows the well-adopted integrating-over-perturbation fashion (sturmfels2020visualizing), yet differs in a novel perturbation and integration regime. Specifically, given an image of interest, our method first generates a population of images, referred to as decoy images, which are plausibly constructed to preserve the intermediate representation of the original input and thus prevent triggering the ill-defined behaviors of the prediction. These decoys can be rigorously generated by solving an efficient optimization problem or approximated by partially blurring the input image. After creating the decoys, we integrate the saliency maps of these decoys by proposing a decoy-enhanced saliency score. The closed-form of the proposed score derived in Section 2.5 suggests that out method is not only capable of highlighting the important features even in the presence of gradient saturation, but also reflecting the feature importance jointly from other locally dependent features as well. Last but not least, we further show in Section 3.1 that these joint effects facilitate the robustness of our method to adversarial perturbations.

We apply our decoy-enhanced saliency score to the ImageNet dataset (russakovsky2015imagenet), in conjunction with three standard saliency methods. We demonstrate empirically that our proposed method performs better than the original saliency methods, both qualitatively and quantitatively, even in the presence of various adversarial perturbations to the image.

Figure 1: Workflow for creating decoy-enhanced saliency maps.
Figure 2: An illustrative example of the swap operator swapping image patches between original and decoy images.

2 Methods

2.1 Problem setup

Consider a multi-label classification task in which a pre-trained neural network model implements a function : that maps from the given input to predicted classes. The score for each class is , and the predicted class is the one with maximum score, i.e., . A common instance of this setting is image classification, in which case the input corresponds to the pixels of an image. A saliency method aims to is assigned a saliency score, encoded in an explanation map (i.e., saliency map) , in which the pixels with higher scores represent higher “importance” to the final prediction.

In this setting, we are given three inputs: a pre-trained neural network model with layers, an image of interest , and a saliency method such that is a saliency map of the same dimensions as . A variety of saliency methods have been proposed in the literature. Some, such as edge or other generic feature detectors (adebayo2018sanity), are independent from the predictive model. In this paper, instead of exhaustively evaluating all saliency methods, we rather highlight how our methods apply to three cases that do depend on the predictor.

The vanilla gradient method (simonyan2013deep) simply calculates the gradient of the class score with respect to the input , which is defined as .

The SmoothGrad method (smilkov2017smoothgrad) seeks to reduce noise in the saliency map by averaging over explanations of noisy copies of an input, defined as

with noise vectors

.

The integrated gradients method (sundararajan2017axiomatic) aims to tackle the problem of gradient saturation. The method starts from a baseline input and sums over the gradient with respect to scaled versions of the input ranging from the baseline to the observed input, defined as .

As illustrated in Figure 1, we are interested in obtaining a more robust saliency maps by decoys constructed as follows.

2.2 Definition of decoys

Say that is the function instantiated by the given network, which maps from an input image to its intermediate representation at layer . At a specified layer , the random vector is said to be a decoy of if the following swappable condition is satisfied:

(1)

The operation swaps image patches between and (Figure 2). Hence, if , then is a new image which is almost equivalent to except that the patch is exchanged from . A “valid” patch is a local region receptive to the convolutional filter of a given CNN, for example, the area in VGGNet (simonyan2014very). The swappable condition ensures that the original image and its decoy are indistinguishable in terms of the intermediate representation at layer . Note in particular that the construction of decoys relies solely on the first layers of the neural network and is independent of the succeeding layers , satisfying

(2)

In other words, is conditionally independent of the classification task given the input . See Supplementary for the explanation why decoys exist. It is worth mentioning that unlike perturbing the input images by nullifying certain subregions or adding random noise which could potentially be out-of-distribution, decoys are plausibly constructed to preserve the input distribution in the sense that their intermediate representations are non-discriminative from the original input data by design.

2.3 Generation of decoys

Given an image of interest and a patch mask , our goal is to construct an decoy image , with respect to a specified neural network layer , such that is as different as possible from while preserving the swappable condition in Equation 1. In this paper, We first propose an optimization-based method which finds decoys efficiently. In addition, we also suggest an practitioners’ alternative to find approximated decoys which works well in practice without extra computational overheads.

2.3.1 Generating decoys by optimization

Based on the definition and the required properties of decoy mentioned above, we propose to generate decoys by optimizing the following objective function:

(3)
s.t.

where , and or depends on whether we intend to investigate the upward or downward limit of deviating from , respectively. The operators and correspond to the and norms, respectively. The constraint ensures that and differ only in the area marked by the patch mask , where denotes entry-wise multiplication. The final constraint ensures that the values in fall into an appropriate range for image pixels, in the normalized image case. In principle, the formulation in Equation 3

is applicable to any convolutional neural network architecture ranging from simple convolutional networks to more sophisticated ones (

e.g., , AlexNet (krizhevsky2012imagenet), Inception network (szegedy2016rethinking) and ResNet (he2016deep), etc.).

Equation 3 is non-differentiable and very difficult to solve; hence, we instead solve an alternative formulation with the help of following tricks. Firstly, we introduced a Lagrange multiplier and augment the first constraint in Equation 3 as a penalty in the objective function. Secondly, we use the change-of-variable trick (carlini2017towards) 111It should be noted that other transformations are also possible but were not (yet) explored in this paper. to eliminate the pixel value constraint (i.e., ). Instead of optimizing over , we introduce satisfying , for all . Because implies , any solution to is naturally valid. Thirdly, we use projected gradient descent during the optimization to eliminate the patch mask constraint (i.e., ). Specifically, after each standard gradient descent step, we enforce . Finally, putting these ideas together, we minimize the following objective function:

(4)

where is initialized small and repeatedly doubled until the optimization succeeds. See Section 3.3 for the sensitivity of choosing initial . Because the norm is not fully differentiable, we adopt the trick introduced by (carlini2017towards). Specifically, we solve the following formulation repeatedly:

(5)

where

is the introduced hyperparameter. In this paper, we follow the suggestion in

(carlini2017towards) and initialize it to be . After each iteration, if the second term is zero, we reduce by a factor of and repeat; otherwise, we terminate the optimization.

Equation 5 can be efficiently solved by any first-order optimization methods without introducing too much computational overhead. Actually, the average run time of solving it is shorter than the fastest, vanilla gradient method (See Supplementary for more details about the run time comparison.)

2.3.2 Approximating decoys by blurring

From the practitioners’ perspective, we also consider an obvious decoy proxy: blurring the corresponding image patches. Formally, given an image of interest and a patch mask , if we designate as the blurred version of returned by a box blur kernel which averages the values of the neighboring pixels ( in our experiment), the resultant decoy satisfies:

(6)

Despite blurring image dose not strictly satisfy the decoy swappable condition defined in Equation 1, the empirical validity of using image blurring is explained in Section 3.2. It is worth mentioning that filling the image patches by inpainting (yu2018generative) could serve as another decoy proxy alternative.

Figure 3: Visualization of saliency maps under adversarial attacks. More examples can be found in Supplementary.
Figure 4: Visualization of saliency maps comparing three state-of-the-art saliency methods with and without decoy images generated from three techniques (i.e., our decoy generation method, blurring, and inpainting). More examples can be found in Supplementary.
Figure 5: Relative difference between the intermediate representation of the original images and decoy/blurring images.

2.4 Decoy-enhanced saliency scores

As illustrated in Figure 1, we can construct a population of independently-sampled patch masks , so that each possible image patch is covered by at least one patch mask. Thus we can construct a population of decoy images with respect to different patch masks. Note that the input image is also a trivially valid decoy, by definition; i.e., it satisfies Equation 1 and the conditional independence requirement (i.e., ). After that, by applying the specified saliency method to those decoy images, we obtain a corresponding population of decoy saliency maps .

Now the pixel-wise variation for each feature can be characterized by a population of feature values . Accordingly, instead of merely quantifying the saliency of each feature via a single value , we can now characterize the uncertainty of the feature’s saliency via the corresponding population of saliency scores . Finally, the decoy-enhanced saliency score for each feature is defined as:

(7)

That is, is determined by the empirical variation of the decoy’s saliency scores.

2.5 Theoretical analysis

Consider a single-layer convolutional neural network with decoy swappable patch size and convolutional size . The input of this CNN is , unrolled from a grayscale image matrix. Similarly, we have a convolutional filter unrolled into , where is indexed as for , here , corresponding to the index shift in matrix form from the top-left to bottom-right pixel, respectively. We denote as the output of the convolution operation on the input , where . For simplicity, we further assume that there are no pathological cases such as . Such a neural network can be represented as:

(8)

where is the entry-wise ReLU operator (glorot2011deep), represents the combined weights of the neural network, and represents the biases. The terms and

are the logits and the predicted class probabilities, respectively. The entry-wise softmax operator for target class

is defined as , for . Without loss of generality, we assume .

In this case, we have the following result:

Proposition 1.

is bounded by:

(9)

where is the knockoff which maximizes , , and and are bounded constants. See Supplementary for the full proof.

Proposition 1 indicates that the saliency statistic is determined by two factors: the range of values associated with pixel among the decoys (

) and the number of neurons jointly activated by pixel

and its neighbors (). The former explains the gradient saturation problem (sundararajan2016gradients) in the sense that important features may have more room to fluctuate without influencing the joint effect on the prediction. The latter compensates for way in which saliency maps treat each pixel independently (singla2019understanding) in the sense that the importance of a pixel is jointly determined by the surrounding pixels (i.e., the localized region in our setting), potentially capturing meaningful patterns such as edges, texture, etc. In this way, the proposed saliency statistic not only compensates for gradient saturation but takes joint activation patterns into consideration as well. Indeed, the bound provided by Proposition 1 can be considered a motivation for defining the decoy-enhanced saliency score.

3 Experiments

To evaluate the effectiveness of our proposed method, we perform extensive experimentson deep learning models that target image classification tasks. The performance of our approach is assessed both qualitatively and quantitatively. The results show that our proposed method identifies intuitively more coherent saliency maps than the state-of-the-art saliency methods alone. The method also achieves quantitatively better alignment to truly important features and demonstrates stronger robustness to adversarial manipulation.

Our experiments primarily use VGG16 (simonyan2014very) pretrained on the ImageNet dataset (russakovsky2015imagenet). We also demonstrate the applicability of our method to other well-studied CNNs such as AlexNet (krizhevsky2012imagenet) and ResNet (he2016deep). The settings for all experiments are reported in Supplementary.

3.1 Benchmark and evaluation criteria

Benchmark and visualization methodologies. To benchmark the performance of our proposed approach, we considered the three baseline saliency methods introduced in Section 2.1. In addition, to evaluate the effectiveness of the decoys generation methods proposed in Section 2.3, we also compare these generation methods with another potential decoy proxies: inpainting.

A heatmap is a typical way of displaying a saliency map. Our strategy for projecting a saliency map onto a heatmap is as follows. Typically, saliency methods produce signed values for input features (pixels in our experiment), where the sign of the saliency value indicates the direction of influence for the corresponding feature to the predicted class. First, to obtain a single importance score for each pixel, we use the maximum absolute saliency score across all color channels, as suggested by (simonyan2013deep; smilkov2017smoothgrad)

. To avoid outlier pixels with extremely high saliency values leading to an almost entirely black heatmap, we then winsorized those outlier saliency values to a relatively high value (e.g., 99th percentile), as suggested by 

(smilkov2017smoothgrad), to achieve better visibility of the heatmap. Finally, We linearly rescale the saliency values to the range [0, 1].

Evaluate criteria. To comprehensively evaluate our proposed approach against the baselines mentioned above, we focus on the following three aspects.

First, we aim to achieve the visual coherence of the identified saliency map. Intuitively, we prefer a saliency method that produces a saliency map that aligns cleanly with the object of interest.

Second, we use the fidelity metric (dabkowski2017real) to quantify the correctness of the saliency maps produced by corresponding saliency method, defined as:

(10)

where indicates the predicted class of input and is the normalized saliency map described above. performs entry-wise multiplication between and , encoding the overlaps between the object of interest and the concentration of the saliency map. The rationale behind this metric is as follows. By viewing the saliency score of the feature as its contribution to the predicted class, a good saliency method will be able to weight important features more highly than less important ones, and thus give rise to higher predicted class scores and lower metric values. It is worth mentioning that we subtract the mean saliency to eliminate the influence of bias in and exclude trivial cases such as .

Finally, we investigate the robustness of our method to adversarial manipulations. In particular, we focus on three popular adversarial attacks (ghorbani2017interpretation): (1) the top-k attack which seeks to decrease the important scores of the top-k important features; (2) the target attack which aims to increase the feature importance of a pre-specified region in the input image; (3) the mass-center attack which aims to spatially change the center of mass of the original saliency map. Here, we specify the bottom-right region of the original image for the target attack and select k equals 5000 in the top-k attack.

In this paper, we use the sensitivity metric proposed by (alvarez2018towards) to quantify the robustness of a saliency method to adversarial attack, defined as:

(11)

where is the perturbed image of . A small sensitivity value means that similar inputs do not lead to substantially different saliency maps.

Figure 6: Generating decoys by preserving intermediate representations with respect to different network layers.
Figure 7: Visualization of saliency maps optimized using different hyperparamer .
(a) Saliency maps generated on AlexNet.
(b) Saliency maps generated on ResNet.
Figure 8: Visualization of saliency maps under different CNN architectures.

3.2 Experimental results

Before comparing the fidelity and robustness of our method with the existing methods, we first conducted a sanity check (adebayo2018sanity) of our method on the ImageNet dataset by using the VGG16 model and the results show that it successfully pass the sanity check (See Supplementary for detailed results).

Fidelity. We applied our decoy-enhanced saliency score to randomly sampled images from ImageNet dataset, in conjunction with three standard saliency methods. A side-by-side comparison of the resulting saliency maps (Figure 4) suggests that decoys 222In the following, we use “decoys” to represent the decoy images generated by solving the optimization function in Equation 5. consistently help to reduce noise and produce more visually coherent saliency maps. For example, the original integrated gradient method highlights the region of dog head in a scattered format. In contrast, the decoy-enhanced integrated gradient method not only highlights the missing body but also identifies the dog head with more details such as ears, cheek, and nose. The visual coherence is also quantitatively supported by the saliency fidelity.

As is also shown in Figure 4, using blurring images as a decoy proxy achieves empirically comparable performance with decoys even though the blurring images violate the design of decoys in the sense that they cannot preserve the intermediate representations. To understand the rationale behind this results, we compared the relative difference of the intermediate representation between the original images and the decoy/blurring images. Here, the relative difference is defined as the norm between two intermediate representations divided by the maximum absolute value of any intermediate representation. By design, the relative difference of decoys is expected to be small, whereas the relative difference of blurring images should be arbitrary.

Figure 5 depicts the relative difference caused by decoys and blurring in two different layers of the VGG16 model. As shown in Figure 5, in the first pooling layer (i.e., a layer near the bottom of the network), the relative difference for blurring image ( on average) is extremely more significant than decoy images ( on average). In the last fully-connected layer (i.e., a layer near the top of the network), as is expected, the relative difference for decoy images ( on average) remains trivial. However, it becomes much smaller for blurry images ( on average). As we can observe from this experiment, even though the blurring images violate the constraint of decoys, this violation would be mitigated in deeper layers of the network, making blurring images a practically acceptable alternative of decoys. However, it still should be noted that decoy images are still necessary to justify the theoretical soundness of the decoy-enhanced saliency score.

Finally, both decoys and blurring images outperform the inpainting images on all of the three baselines in terms of quantitative and qualitative performance, indicating inpainting image is not an ideal choice for decoy proxy.

Robustness. We applied our decoy-enhanced saliency score to images subjected to three adversarial attacks described in Section 3.1, with the results shown in Figure 3. As illustrated in Figure 3, though not fully resistant to adversarial attacks, the decoy-enhanced saliency methods consistently mitigates the impact of adversarial manipulations, leading to visually more coherent saliency maps as well as lower sensitivity scores.

The robustness of the decoy-enhanced saliency methods could be explained as follows. In a normal situation (when the image doesn’t suffer from adversarial attack), an important pixel doesn’t merely be important in an isolated fashion. Instead, the important pixel tends to contribute a strong joint effect in conjunction with neighboring pixels, to potentially capture meaningful patterns such as edges, texture, etc. In light of this observation, this particular important pixel will have more room to fluctuate without influencing the joint effect on the prediction. In such a case, some elements of will be high, and others will be low, contributing a large . In the unusual situation when an isolated important pixel is indeed observed (i.e., a pixel is consistently important for all decoy images and has no room for fluctuation), we tend to believe that the pixel has been attacked. In this case, all elements of will be high, and will be extremely low. As a result, our decoy statistic successfully knockoffs the pixels that are deemed as important ones by the baseline saliency methods due to adversarial attacks.

3.3 Sensitivity to hyperparameters

We also conduct experiments to understand the impact of hyperparameter choices on the performance of our optimization-based decoy generation method. Specifically, we focus on the choice of two hyperparameters: network layer and initial Lagrange multiplier . Accordingly, we first varied the value of for VGG16 and compared the differences of the generated decoy saliencies from the three aforementioned saliency methods. In particular, we set it to range from the first convolutional layer to the last pooling layer and demonstrate the generated knockoff saliencies in Figure 6 and Supplementary. Note that according to our design, only the convolutional layers and the pooling layers can be used to generate decoy images. For each saliency method, Figure 6 demonstrates that the decoy saliencies generated from different layers for the same image are of similar qualities. We also compute the score for each saliency map and report the mean and standard derivation of the

scores for each saliency method. The mean and standard deviation, respectively, for the gradient, integrated gradient and SmoothGrad methods are as follows: (-0.11, 0.02), (-0.18, 0.02), (-0.21, 0.01). These quantitative results also support the conclusion that our approach is not sensitive to layer. This is likely because, as previous research has shown 

(chan2015pcanet; saxe2011random)

, the final classification results of a DNN are not highly related to the hidden representations. As a result, generating decoy saliencies for the same sample with the same label from different layers should yield similar results.

We also varied the initial Lagrange multiplier to be and compared the differences of the generated decoy saliencies. Figure 7 depicts the quantitative and qualitative comparison results. As shown in the figure, the different choices of initial all produce similar saliency maps, indicating a negligible influence on our method. The results in Figure 6 and 7 indicate we can expect to obtain stable decoy saliencies even if the hyperparameters are subtly varied. This is a critical characteristic because users do not need to overly worry about setting very precise hyperparameters to obtain a desired saliency map.

3.4 Applicability to other neural architectures

In addition to the VGG16 model, we generated saliency maps for other neural architectures: AlexNet (krizhevsky2012imagenet) and ResNet (he2016deep). We visualize their saliency maps in Figure 8 and Supplementary. We observe that our method consistently outperforms the baseline methods, both quantitatively and qualitatively. This result indicates that we can apply our knockoff approach to various neural architectures and expect consistent performance. It should be noted that, in comparison with saliency maps derived from other neural architectures, the maps tied to AlexNet are relatively vague. We hypothesize that this is due to the relative simplicity of the AlexNet architecture.

4 Related Work

As is mentioned in Section 1, besides saliency methods, another major category of interpretation methods are counterfactual-based methods. These methods perturb (i.e., nullifying, blurring, adding noise or inpainting) small regions of the image and identify the important regions by observing the chances in the classifier predictions (ribeiro2016should; lundberg2017unified; chen2018learning; fong2017interpretable; dabkowski2017real; chang2017interpreting; change2019xplaining; yousefzadeh2019interpreting; goyal2019counterfactual). Despite identifying meaningful regions in practice, these methods exhibit several limitations. First, counterfactual-based methods implicitly assume that regions containing the object are the ones most contributed to the prediction (fan2017adversarial). However, (moosavi2017universal) revealed that counterfactual-based methods are also vulnerable to the adversarial attacks, which force these explanation methods to output non-related backgrounds rather than the meaningful objects as important subregions. In addition, the counterfactual images may be potentially far away from the distribution from which the training samples were drawn (burns2019hypothesis), causing the classifier behavior ill-defined.

In addition to these limitations, the counterfactual-based and our method are fundamentally different in the following three aspects. Firstly, unlike the former that seeks the minimum set of features to exclude so as to minimize the prediction score or to include so as to maximize the prediction score (fong2017interpretable), the latter aims to characterize the influence of each feature to the prediction score. Secondly, Counterfactual-based methods involve comparing against the closest image on the other side of the decision boundary, thereby explicitly considering the decision boundary. In contrast, the proposed method considers the decision boundaries implicitly, by calculating the variants of gradient with respect to the label. Finally, unlike counterfactual images which could potentially be out-of-distribution by adding noise (smilkov2017smoothgrad), rescaling (sundararajan2017axiomatic), blurring (fong2017interpretable), or inpainting the image (chang2017interpreting), decoys are plausibly constructed in the sense that their intermediate representations are non-discriminative from the original input data by design. In consideration of these limitations and differences, in this paper, we do not compare our method with the counterfactual-based methods.

5 Discussion

Choice of decoy generation. We have considered one rigorous decoy generation method and two proxies: blurring and inpaiting. As shown in Section 3.2, inpainting is less satisfactory due to its inferior performance and extra cost of training an inpainting model. In comparison, from practitioners’ alternative, blurring works well without extra computational overheads. However, optimization-based decoy are still necessary to justify the theoretical soundness of the decoy-enhanced saliency score. In addition, given that our optimization-based decoys generation method is robust to hyperparameters, from a more stringent perspective, we still suggest the users select this method, which, by design, satisfies the requirements of decoy statistics, and thus can be used to any network architectures.

Scalability. The computational cost of our method mainly lies in generating multiple decoys and compute the corresponding saliency maps. Specifically, as is discussed in Section 2.3, our optimization-based decoy generation only introduces less than overhead. Using blurring images further reduces the cost of decoy generation to a negligible level. In addition, similar to other ensemble methods (e.g., Integrate Gradients and SmoothGrad), the saliency maps can be parallelly computed in a batch mode, and GPUs can significantly accelerate the computational process. As a result, in comparison with existing saliency methods, our method only introduces negligible computational overhead, which does not influence its usage in the real world.

6 Conclusion and Future Work

In this work, we have used distribution-preserving decoys as a way to produce more robust saliency scores. First, we formulate decoy generation as an optimization problem, in principle applicable to any network architecture. We demonstrate the superior performance of our method relative to three standard saliency methods, both qualitatively and quantitatively, even in the presence of various adversarial perturbations to the image. Second, by deriving a closed-form formula for the decoy-enhanced saliency score, we show that our saliency scores compensate for the frequently violated assumption in saliency methods that important features in general correspond to large gradients.

Although we have shown that decoys can be used to produce more accurate and robust saliency scores, there remain some interesting directions for future work. One possible extension is to explore other possible proxies for decoys. Another promising direction for future work could be reframing interpretability as hypothesis testing and using decoys to deliver a set of salient pixels, subject to false discovery rate control at some pre-specified level (burns2019hypothesis; lu2018deeppink).

References