1 Introduction
Artificial Intelligence (AI) [1][3][7] has gone from a sciencefiction dream to a critical part of our everyday life. Notably, deep learning has achieved superior performance in image classification and other perception intelligence tasks. Despite its outstanding contribution to the progress of AI, deep learning models remain mostly black boxes, which are extremely weak in explaining the reasoning process and prediction results. Nevertheless, many realworld applications are missioncritical, and users concern about how the AI solution is arriving at its decisions and insights. Therefore, model transparency and explainability are essential to ensure AI’s broad adoption in various vertical domains.
There has been a recent surge in the development of explainable AI techniques[4][5][12]. Among them, the post hoc techniques for explaining blackbox models in a humanunderstandable manner have received much attention in the research community[9][8][2]. Modelagnostic is the prominent characteristic of these methods, which generate perturbed samples of a given instance in the feature space and observe the effect of these perturbed samples on the output of the blackbox classifier. In [9]
, the authors proposed the Local Interpretable Modelagnostic Explanation (LIME), which explains the predictions of any classifier faithfully by fitting a linear regression model locally around the prediction. The sampling operation for LIME is a random uniform distribution, which is straightforward but defective, ignoring the correlation between features. Proper sampling operation is especially essential in natural image recognition because the visual features of natural objects exhibit a strong correlation in the spacial neighborhood, rather than a complete uniform distribution. In some cases, when most uniformly generated samples are unrealistic about the actual distribution, false information contributors lead to poorly fitting of the local explanation model.
In this paper, we propose a Modified Perturbed Sampling method for LIME (MPSLIME), which takes into full account the correlation between features. We convert the superpixel image into an undirected graph, and then the perturbed sampling operation is formalized as the clique set construction problem. We perform various experiments on explaining Google’s pretrained Inception neural network
[11]. The experimental results show that the MPSLIME explanation of the blackbox model can achieve much better performance than LIME in terms of understandability, fidelity, and efficiency.2 MPSLIME explanation
In this section, we first introduce the interpretable image representation and the modified perturbed sampling for local exploration. Then we present the explanation system of MPSLIME.
2.1 Interpretable Image Representation
An interpretable representation should be understandable to observers, regardless of the underlying features used by the model. Most image classification tasks represent the image as a tensor with three color channels per pixel. Considering the poor interpretability and high computational complexity of the pixelbased representation, we adopt a superpixel based interpretable representation. Each superpixel, as the primary processing unit, is a group of connected pixels with similar colors or gray levels. Superpixel segmentation is dividing an image into some nonoverlapping superpixels. More specifically, we denote
be the original representation of an image, and binary vector
be its interpretable representation where indicates the presence of original superpixel and indicates an absence of original superpixel.2.2 A Modified Perturbed Sampling for Local Exploration
In order to learn the local behavior of image classifier , we generate a group of perturbed samples of a given instance, , by activating a subset of superpixels in . For the images, especially natural images, superpixel segments often correspond to the coherent regions of visual objects, showing strong correlation in a spacial neighborhood. If the activated superpixels come from an independent sampling process, we may lose much useful information to learn the local explanation models. The perturbed sampling operation in the standard implementation of LIME is to draw nonzero elements of uniformly at random. This approach is at risk of ruining the learning process of local explanation models, since the generated samples may ignore the correlation between superpixels.
In this section, we propose a modified perturbed sampling method, which takes into full account the correlation among superpixels. Firstly, we convert the superpixel segments into an undirected graph. Specifically, as shown in Figure 1, the superpixel segments are represented as vertices of a graph whose edges connect to only those adjacent segments. Considering a graph , where and are the sets of vertices and undirected edges, with cardinalities and , a subset of can be represented by a binary vector , where indicates that vertice is in the subset.
The modified perturbed sampling operation is formalized as finding the clique (), where every two vertices are adjacent. Since the cardinality of maximum clique of the constructed graph is , the clique consists of three subset . The three subsets are as follows: is the subset that only contains one vertice. is the subset that only contains two vertices that are connected by an edge. is the subset that contains three vertices, and every two vertices are adjacent (Figure 2). In this paper, we use the DepthFirst Search (DFS) method to get the clique . Algorithm 1 shows a simplified workflow diagram.
Since there is a strong correlation between the adjacent superpixel image segments, the clique set construction can take into full account the various types of neighborhood correlation. Moreover, the number of perturbed samples of MPSLIME is much smaller than that in the current implementation of LIME, which significantly reduces the runtime.
2.3 Explanation System of MPSLIME
The goal of the explanation system is to identify an interpretable model over the interpretable representation that is locally faithful to the classifier. We denote the original image classification model being explained by , and the interpretable model by . This problem can be formalized as an optimization problem:
(1) 
where the locality fidelity loss is calculated by the locally weighted square loss:
(2) 
The database is composed of perturbed samples which are sampled around by the method described in Section 3.2. Given a perturbed sample , we recover the sample in the original representation and get . Moreover, is the distance function to capture locality.
Algorithm 2 shows a simplified workflow diagram of MPSLIME. Firstly, MPSLIME gets the superpixel image by using the segment method. Then it converts the superpixel image segments into an undirected graph. The database is constructed by finding the clique of an undirected graph, which is solved by the DFS method. Finally, MPSLIME gets the by using the KLASSO method, which is the same as that in LIME [9].
3 Experimental Results
In this section, we perform various experiments on explaining the predictions of Google’s pretrained Inception neural network[11]. We compare the experimental results between LIME and MPSLIME in terms of understandability, fidelity, and efficiency.
3.1 Measurement criterion of interpretability
Fidelity, understandability, and efficiency are three important goals for interpretability[6][10]. An explainable model with good interpretability should be faithful to the original model, understandable to the observer, and graspable in a short time so that the enduser can make wise decisions. Mean Absolute Error (MAE) and Coefficient of determination are two import measures of fidelity. MAE is the absolute error between the predicted value and true value, which can reflect the predictive accuracy well,
(3) 
is calculated by Total Sum of Squares (SST) and Error Sum of Squares (SSE):
(4) 
where is the true value, is the predicted value and is the mean value of true value. The best is . The closer the score is to , the better the performance of fidelity is to explainer.
3.2 Google’s Inception neural network on Imagenet database
We explain image classification predictions made by Google’s pretrained Inception neural network[11]. The first row in Figure 3 shows six original images. The second row and third row are the superpixels explanations by LIME and MPSLIME, respectively. The explanations highlight the top superpixel segments, which have the most considerable positive weights towards the predictions (K=5).
Table 1
lists the MAE of LIME and MPSLIME. We find some of the predictive probability values of LIME is bigger than
. This is because LIME adopts a sparse linear model to fit the perturbed samples, and has no more constraints such as the probability values distribution should range between 0 and 1. Comparing to LIME, we can see that MPSLIME provides better predictive accuracy than LIME. Besides, of LIME and MPSLIME are listed in Table 1. The closer the score is to , the better the performance of fidelity is to an explainer. The of MPSLIME is around , which is much bigger than LIME. By comparing the MAE and of two algorithms, we can conclude that MPSLIME has better fidelity than LIME.Efficiency is highly related to the time necessary for a user to grasp the explanation. The runtime of LIME and MPSLIME are shown in Table 2, which shows that the runtime of MPSLIME is nearly half as the runtime of LIME. We can conclude from the above results that MPSLIME not only has a higher fidelity but also take less time than LIME.
true prob (Inception)  pred prob  MAE  

LIME  1.6285  0.6350  0.6885  
MPSLIME  0.9783  0.0152  0.8944  
LIME  0.8291  0.2214  0.4531  
MPSLIME  0.5973  0.0104  0.9825  
LIME  1.6668  0.7237  0.636  
MPSLIME  0.9284  0.0147  0.9222  
LIME  0.4822  0.0819  0.6958  
MPSLIME  0.5568  0.0073  0.9304  
LIME  1.2854  0.3498  0.7407  
MPSLIME  0.9203  0.0153  0.9925  
LIME  1.0166  0.2520  0.3535  
MPSLIME  0.7531  0.0115  0.9155 
img1  img2  img3  img4  img5  img6  

LIME  232.20  230.45  245.36  264.51  223.79  226.58 
MPSLIME  91.02  113.85  109.29  154.57  117.21  152.84 
4 Conclusion and Future Work
The sampling operation for local exploration in the current implementation of LIME is a random uniform sampling, which possibly generates unrealistic samples ruining the learning of local explanation models. In this paper, we propose a modified perturbed sampling method MPSLIME, which takes into full account the correlation between features. We convert the superpixel image into an undirected graph, and then the perturbed sampling operation is formalized as the clique set construction problem. We perform various experiments on explaining the randomforest classifier and Google’s pretrained Inception neural network. Various experiment results show that the MPSLIME explanation of multiple blackbox models can achieve much better performance in terms of understandability, fidelity, and efficiency.
There are some avenues of future work that we would like to explore. This paper only describes the modified perturbed sampling method for image classification. We will apply the similar idea to text processing and structural data analytics. Besides, we will improve other post hoc explanations techniques that rely on input perturbations such as SHAP and propose a general optimization scheme.
References
 [1] (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §1.
 [2] I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 49 december 2017, long beach, ca, USA. Cited by: §1.
 [3] (2009) The elements of statistical learning. Springer. Note: www.web.stanford.edu/~hastie/ElemStatLearn Cited by: §1.
 [4] (2016) Harnessing deep neural networks with logic rules. See DBLP:conf/acl/20161, External Links: Link Cited by: §1.
 [5] (2012) Intelligible models for classification and regression. See DBLP:conf/kdd/2012, pp. 150–158. External Links: Link, Document Cited by: §1.

[6]
(2019)
Interpretable machine learning: a guide for making black box models explainable
. In Lulu, 1st edition, March 24, 2019; eBook, Cited by: §3.1.  [7] (201706) Faster rcnn: towards realtime object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: Document, ISSN 01628828 Cited by: §1.
 [8] (2016) Modelagnostic interpretability of machine learning. CoRR abs/1606.05386. External Links: Link, 1606.05386 Cited by: §1.
 [9] (2016) Why should I trust you?: explaining the predictions of any classifier. See DBLP:conf/kdd/2016, pp. 1135–1144. External Links: Link, Document Cited by: §1, §2.3.
 [10] (2006) Learning interpretable models (phd thesis). In Technical University of Dortmund, Cited by: §3.1.

[11]
(201506)
Going deeper with convolutions.
In
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, Vol. , pp. 1–9. External Links: Document, ISSN Cited by: §1, §3.2, §3. 
[12]
(2018)
Interpretable convolutional neural networks
. See DBLP:conf/cvpr/2018, pp. 8827–8836. External Links: Link, Document Cited by: §1.
Comments
There are no comments yet.