Python code for perturbation-based saliency map
Feature attribution methods, or saliency maps, are one of the most popular approaches for explaining the decisions of complex machine learning models such as deep neural networks. In this study, we propose a stochastic optimization approach for the perturbation-based feature attribution method. While the original optimization problem of the perturbation-based feature attribution is difficult to solve because of the complex constraints, we propose to reformulate the problem as the maximization of a differentiable function, which can be solved using gradient-based algorithms. In particular, stochastic optimization is well-suited for the proposed reformulation, and we can solve the problem using popular algorithms such as SGD, RMSProp, and Adam. The experiment on the image classification with VGG16 shows that the proposed method could identify relevant parts of the images effectively.READ FULL TEXT VIEW PDF
While several feature scoring methods are proposed to explain the output...
This paper proposes a conjugate-gradient-based Adam algorithm blending A...
Epidemiological models have tremendous potential to forecast disease bur...
We consider a distributed stochastic optimization problem in networks wi...
We propose and study a method for learning interpretable features via
Determining the 3D structures of biological molecules is a key problem f...
Preconditioned gradient methods are among the most general and powerful ...
Python code for perturbation-based saliency map
Feature attribution methods [1, 2, 3, 4, 5, 6, 7], or saliency maps, are one of the most popular approaches for explaining the decisions of complex machine learning models such as deep neural networks. In feature attribution, for each of given instance, they score how much each feature is relevant to the model’s decision: they score the features relevant to the model’s decision with large values and irrelevant features with small values. For example, in image recognition, they highlight which pixels the models have focused on by scoring the relevance of each pixel [1, 2, 3, 4, 5], and in text classification, they detect a set of words or sentences relevant to the model’s decision by scoring each words or sentences [8, 9]. With feature attribution methods, one can obtain relevant features, such as pixels or words, as explanations why the models made certain decisions, which also helps the users to inspect whether the models are reliable or not. Major approaches for feature attribution are based on modified gradients [1, 2, 3, 4, 5, 6] and feature maskings .
Recently, Hara et al.  proposed to separate the definition of the feature attribution scores and the algorithms to compute them. In most of the previous studies, scores are defined by their computation algorithms themselves except for some axiomatic approaches [4, 11]. The separation of the definitions and the algorithms allows us to consider each aspect independently. For example, if the definition is not appropriate, improving algorithms does not help and we need to reconsider the definition in such a situation.
As one possible definition of the feature attribution score, Hara et al. 
proposed to measure the irrelevance of each feature to the model’s decision by the maximum size of data perturbations that does not change the decision. Specifically, they defined the score as a solution to the optimization problem that maximizes the size of data perturbation under the constraint that the perturbed data to remain inside the model’s decision boundary. They then proposed an algorithm that solves the optimization problem approximately: they approximated the constraint with linear functions and reformulated the problem as linear programming. The linear programming formulation is found to be useful with several flexible extensions such as relaxing constraints and sharing scores among features. The drawback of the linear approximation, however, is that we cannot strictly enforce the perturbed data to stay inside the model’s decision boundary. Therefore, the solution to the linear programming can violate the definition.
This study is positioned as the improvements of the algorithm for the score defined by Hara et al. . Specifically, to relieve the risk of the definition violation in the method of Hara et al. , we propose an algorithm to solve the problem without linear approximation. In the proposed approach, we reformulate the problem so that it can be solved by using gradient-based algorithms. Specifically, we rewrite the constraint as a differential penalty function in the objective function to be maximized. With this reformulation, the problem is expressed as the maximization of a differentiable function, which can be solved using gradient-based algorithms. In particular, stochastic optimization is well-suited for the proposed reformulation, and we can solve the problem using popular algorithms such as SGD, RMSProp , and Adam .
In this paper, we consider the classification model for categories that returns an output for a given input , i.e., . The classification result is determined as where is the -th element of the output. We assume that the model is differentiable with respect to the input : the target models therefore include linear models, kernel models with differentiable kernels, and deep neural networks. We assume that the model and the target input to be explained are given and fixed.
In this section, we briefly review the definition of the problem introduced by Hara et al. . As an explanation of the input , we seek the maximally invariant data perturbation that does not change the model’s decision.
We start from introducing invariant perturbation set. We say that a set is an invariant perturbation set if the model’s decision is invariant for all , i.e., .
In the study, for ease of computation, we restrict our attention to a box-shaped invariant perturbation set for a parameter 111In Hara et al. , the lower and upper boundaries of the box are parametrized by different parameters and . Here, we use a common parameter for ease of computation.. From this definition of , the size of the invariant perturbation of each feature is proportional to . The idea here is that, if the invariant perturbation is small, the change of the feature can highly impacts the model’s decision, which indicates that the feature is relevant to the decision. On the other hand, if is large, the feature only has a minor impact to the model’s decision, and thus it is less relevant.
To obtain an invariant perturbation set appropriate for feature attribution, Hara et al.  proposed to maximize the side lengths of the so that to be sufficiently large for irrelevant features.
Find the invariant perturbation set , where
Here, is the upper bound of the perturbation. The value of can be usually determined from the nature of the data. For example, if the data is the image, the value of each pixel is usually restricted in . In this case, the upper bound is the natural choice.
We now turn to our proposed method to solve the problem (2.1). The difficulty on solving the problem (2.1) is that it requires the constraint to hold for all possible . In the proposed method, we rewrite the constraint by using the expectation over . In this way, we can reformulate the problem as the maximization of a differentiable function, which can be solved using gradient-based algorithms.
We reformulate the problem (2.1) into the penalty-based expression, so that the gradient-based optimization algorithms to be applicable. Recall that the constraint is equivalent to a set of constraints over . Here, we rewrite the constraint into the following equivalent expression using expectation.
The constraint is equivalent to
where denotes the expectation over uniformly random , and denotes an element-wise product.
We first note that
is a uniform random variable in. Here, recall that is non-negative. Therefore, the inequality holds if and only if . Hence, for the inequality to hold, the measure of the points that violates the inequality, i.e., , must be zero. ∎
Thanks to the formulation (3.2), it is differentiable with respect to the parameter . Hence, we can apply the gradient-based optimization methods. Here, we note that stochastic optimization algorithms, such as SGD, RMSProp , and Adam , are particularity suited for the problem. In each iteration of the algorithm, we randomly generate perturbations , and approximate the objective function using the sample average:
The psuedo code of the stochastic optimization is shown in Algorithm 1. The line 6 is added in the algorithm so that the parameter to stay between the lower bound zero and the upper bound .
distributed at the Tensorflow repository. As the target datato be explained, we used COCO-animal dataset222cs231n.stanford.edu/coco-animals.zip. Specifically, for the experiment, we used images in the validation set.
In the proposed method, we set and where the data dimension is 333A sample code is available at https://github.com/sato9hara/PertMap. As the optimization algorithm, we used Adam  with the step size set to and remaining parameters set to be default values. To measure the relevance of the feature to the model’s decision, we used , the negative of the perturbation size, as the score. Large score, or small , means that the change of the feature can highly impacts the model’s decision, which indicates that the feature is relevant to the decision. On the other hand, if the score is small, or is large, the feature only has a minor impact to the model’s decision, and thus it is less relevant.
As the baseline methods, we used Gradient , GuidedBP , SmoothGrad , IntGrad , LRP , DeepLIFT , Occlusion, and the LP-based methods (LP and LP(Smooth)) . Gradient, GuidedBP, SmoothGrad and IntGrad are implemented using saliency444https://github.com/PAIR-code/saliency with default settings, and LRP, DeepLIFT, and Occlusion are implemented using DeepExplain555https://github.com/marcoancona/DeepExplain where we set the mask size for Occlusion as . For the LP-based methods, we used the same setting as the ones used in Hara et al.  ( and ). We note that, all the methods including the proposed method return the scores of size .
We evaluated the effectiveness of each method by making pixels of the images. Specifically, we mask low score pixels with gray colors and observe whether the model’s classification result is resistant to the flipping. We expect that good feature attribution methods to identify relevant pixels with high scores. Therefore, with good attribution methods, the model’s classification result will kept unchanged even if we flip many low score pixels to gray, as relevant parts of the images remain unflipped. On the other hand, if the attribution methods fail to identify relevant pixels with high scores, the classification result can change even with a small number of flips.
We conducted the experiment as follows.
Flip pixels with scores smaller than the
% quantile to(i.e., we replace the selected pixels with gray pixels666We also conducted experiments by replacing with zero (black) or one (white). The results were similar, and thus omitted.).
Observe the ratio of the images with the classification result changes within the 200 images: the result changes in less images indicate that the feature attribution methods successfully identified relevant parts of the images.
We varied the threshold quantile from (no flip) to (all filp), and summarized the result in Figure 1
. It is clear that the proposed method is the most resistant to the masking: the changes on the classification results are kept almost zeros even if 50% of the pixels are masked. This indicates that the proposed method successfully identified relevant parts of the images. Moreover, the proposed method consistently outperformed the LP-based methods. We conjecture that the LP-based methods can violate the constraint in the problem because of approximation, which led to less accurate estimate of the solution to the problem (2.1). By contrast, the proposed method solves the problem without approximation. Hence, the derived solution is more accurate than the ones of the LP-based methods.
Figures 2, 3, and 4 show examples of the computed scores with each feature attirbution method. The figures clearly show that the proposed method attained a very high S/N ratios compared to the existing method where the scores tend to be noisy. We conjecture that this was because the proposed method could provide high quality solutions to the problem (2.1) without linear approximation.
In this study, as a novel feature attribution method, we proposed a stochastic optimization method for finding maximally invariant data perturbation. In the proposed approach, we reformulated the problem as the maximization of a differentiable function, which can be solved using gradient-based algorithms. In particular, stochastic optimization is well-suited for the proposed reformulation, and we can solve the problem using popular algorithms such as SGD, RMSProp, and Adam. The experimental result on the image classification with VGG16 shows that the proposed method could identify relevant parts of the images effectively.
This work was supported by JSPS KAKENHI Grant Number JP18K18106.
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS ONE, 10(7):e0130140, 2015.
Visualizing and understanding neural machine translation.Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1150–1159, 2017.
Proceeding of the IEEE International Conference on Computer Vision, pages 3449 – 3457, 2017.