Artificial Intelligence (AI)  has gone from a science-fiction dream to a critical part of our everyday life. Notably, deep learning has achieved superior performance in image classification and other perception intelligence tasks. Despite its outstanding contribution to the progress of AI, deep learning models remain mostly black boxes, which are extremely weak in explaining the reasoning process and prediction results. Nevertheless, many real-world applications are mission-critical, and users concern about how the AI solution is arriving at its decisions and insights. Therefore, model transparency and explainability are essential to ensure AI’s broad adoption in various vertical domains.
There has been a recent surge in the development of explainable AI techniques. Among them, the post hoc techniques for explaining black-box models in a human-understandable manner have received much attention in the research community. Model-agnostic is the prominent characteristic of these methods, which generate perturbed samples of a given instance in the feature space and observe the effect of these perturbed samples on the output of the black-box classifier. In 
, the authors proposed the Local Interpretable Model-agnostic Explanation (LIME), which explains the predictions of any classifier faithfully by fitting a linear regression model locally around the prediction. The sampling operation for LIME is a random uniform distribution, which is straightforward but defective, ignoring the correlation between features. Proper sampling operation is especially essential in natural image recognition because the visual features of natural objects exhibit a strong correlation in the spacial neighborhood, rather than a complete uniform distribution. In some cases, when most uniformly generated samples are unrealistic about the actual distribution, false information contributors lead to poorly fitting of the local explanation model.
In this paper, we propose a Modified Perturbed Sampling method for LIME (MPS-LIME), which takes into full account the correlation between features. We convert the superpixel image into an undirected graph, and then the perturbed sampling operation is formalized as the clique set construction problem. We perform various experiments on explaining Google’s pre-trained Inception neural network. The experimental results show that the MPS-LIME explanation of the black-box model can achieve much better performance than LIME in terms of understandability, fidelity, and efficiency.
2 MPS-LIME explanation
In this section, we first introduce the interpretable image representation and the modified perturbed sampling for local exploration. Then we present the explanation system of MPS-LIME.
2.1 Interpretable Image Representation
An interpretable representation should be understandable to observers, regardless of the underlying features used by the model. Most image classification tasks represent the image as a tensor with three color channels per pixel. Considering the poor interpretability and high computational complexity of the pixel-based representation, we adopt a superpixel based interpretable representation. Each superpixel, as the primary processing unit, is a group of connected pixels with similar colors or gray levels. Superpixel segmentation is dividing an image into some non-overlapping superpixels. More specifically, we denote
be the original representation of an image, and binary vectorbe its interpretable representation where indicates the presence of original superpixel and indicates an absence of original superpixel.
2.2 A Modified Perturbed Sampling for Local Exploration
In order to learn the local behavior of image classifier , we generate a group of perturbed samples of a given instance, , by activating a subset of superpixels in . For the images, especially natural images, superpixel segments often correspond to the coherent regions of visual objects, showing strong correlation in a spacial neighborhood. If the activated superpixels come from an independent sampling process, we may lose much useful information to learn the local explanation models. The perturbed sampling operation in the standard implementation of LIME is to draw nonzero elements of uniformly at random. This approach is at risk of ruining the learning process of local explanation models, since the generated samples may ignore the correlation between superpixels.
In this section, we propose a modified perturbed sampling method, which takes into full account the correlation among superpixels. Firstly, we convert the superpixel segments into an undirected graph. Specifically, as shown in Figure 1, the superpixel segments are represented as vertices of a graph whose edges connect to only those adjacent segments. Considering a graph , where and are the sets of vertices and undirected edges, with cardinalities and , a subset of can be represented by a binary vector , where indicates that vertice is in the subset.
The modified perturbed sampling operation is formalized as finding the clique (), where every two vertices are adjacent. Since the cardinality of maximum clique of the constructed graph is , the clique consists of three subset . The three subsets are as follows: is the subset that only contains one vertice. is the subset that only contains two vertices that are connected by an edge. is the subset that contains three vertices, and every two vertices are adjacent (Figure 2). In this paper, we use the Depth-First Search (DFS) method to get the clique . Algorithm 1 shows a simplified workflow diagram.
Since there is a strong correlation between the adjacent superpixel image segments, the clique set construction can take into full account the various types of neighborhood correlation. Moreover, the number of perturbed samples of MPS-LIME is much smaller than that in the current implementation of LIME, which significantly reduces the runtime.
2.3 Explanation System of MPS-LIME
The goal of the explanation system is to identify an interpretable model over the interpretable representation that is locally faithful to the classifier. We denote the original image classification model being explained by , and the interpretable model by . This problem can be formalized as an optimization problem:
where the locality fidelity loss is calculated by the locally weighted square loss:
The database is composed of perturbed samples which are sampled around by the method described in Section 3.2. Given a perturbed sample , we recover the sample in the original representation and get . Moreover, is the distance function to capture locality.
Algorithm 2 shows a simplified workflow diagram of MPS-LIME. Firstly, MPS-LIME gets the superpixel image by using the segment method. Then it converts the superpixel image segments into an undirected graph. The database is constructed by finding the clique of an undirected graph, which is solved by the DFS method. Finally, MPS-LIME gets the by using the K-LASSO method, which is the same as that in LIME .
3 Experimental Results
In this section, we perform various experiments on explaining the predictions of Google’s pre-trained Inception neural network. We compare the experimental results between LIME and MPS-LIME in terms of understandability, fidelity, and efficiency.
3.1 Measurement criterion of interpretability
Fidelity, understandability, and efficiency are three important goals for interpretability. An explainable model with good interpretability should be faithful to the original model, understandable to the observer, and graspable in a short time so that the end-user can make wise decisions. Mean Absolute Error (MAE) and Coefficient of determination are two import measures of fidelity. MAE is the absolute error between the predicted value and true value, which can reflect the predictive accuracy well,
is calculated by Total Sum of Squares (SST) and Error Sum of Squares (SSE):
where is the true value, is the predicted value and is the mean value of true value. The best is . The closer the score is to , the better the performance of fidelity is to explainer.
3.2 Google’s Inception neural network on Image-net database
We explain image classification predictions made by Google’s pre-trained Inception neural network. The first row in Figure 3 shows six original images. The second row and third row are the superpixels explanations by LIME and MPS-LIME, respectively. The explanations highlight the top superpixel segments, which have the most considerable positive weights towards the predictions (K=5).
lists the MAE of LIME and MPS-LIME. We find some of the predictive probability values of LIME is bigger than. This is because LIME adopts a sparse linear model to fit the perturbed samples, and has no more constraints such as the probability values distribution should range between 0 and 1. Comparing to LIME, we can see that MPS-LIME provides better predictive accuracy than LIME. Besides, of LIME and MPS-LIME are listed in Table 1. The closer the score is to , the better the performance of fidelity is to an explainer. The of MPS-LIME is around , which is much bigger than LIME. By comparing the MAE and of two algorithms, we can conclude that MPS-LIME has better fidelity than LIME.
Efficiency is highly related to the time necessary for a user to grasp the explanation. The runtime of LIME and MPS-LIME are shown in Table 2, which shows that the runtime of MPS-LIME is nearly half as the runtime of LIME. We can conclude from the above results that MPS-LIME not only has a higher fidelity but also take less time than LIME.
|true prob (Inception)||pred prob||MAE|
4 Conclusion and Future Work
The sampling operation for local exploration in the current implementation of LIME is a random uniform sampling, which possibly generates unrealistic samples ruining the learning of local explanation models. In this paper, we propose a modified perturbed sampling method MPS-LIME, which takes into full account the correlation between features. We convert the superpixel image into an undirected graph, and then the perturbed sampling operation is formalized as the clique set construction problem. We perform various experiments on explaining the random-forest classifier and Google’s pre-trained Inception neural network. Various experiment results show that the MPS-LIME explanation of multiple black-box models can achieve much better performance in terms of understandability, fidelity, and efficiency.
There are some avenues of future work that we would like to explore. This paper only describes the modified perturbed sampling method for image classification. We will apply the similar idea to text processing and structural data analytics. Besides, we will improve other post hoc explanations techniques that rely on input perturbations such as SHAP and propose a general optimization scheme.
-  (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §1.
-  I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, USA. Cited by: §1.
-  (2009) The elements of statistical learning. Springer. Note: www.web.stanford.edu/~hastie/ElemStatLearn Cited by: §1.
-  (2016) Harnessing deep neural networks with logic rules. See DBLP:conf/acl/2016-1, External Links: Cited by: §1.
-  (2012) Intelligible models for classification and regression. See DBLP:conf/kdd/2012, pp. 150–158. External Links: Cited by: §1.
Interpretable machine learning: a guide for making black box models explainable. In Lulu, 1st edition, March 24, 2019; eBook, Cited by: §3.1.
-  (2017-06) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: Cited by: §1.
-  (2016) Model-agnostic interpretability of machine learning. CoRR abs/1606.05386. External Links: Cited by: §1.
-  (2016) Why should I trust you?: explaining the predictions of any classifier. See DBLP:conf/kdd/2016, pp. 1135–1144. External Links: Cited by: §1, §2.3.
-  (2006) Learning interpretable models (phd thesis). In Technical University of Dortmund, Cited by: §3.1.
-  (2015-06) Going deeper with convolutions. In , Vol. , pp. 1–9. External Links: Cited by: §1, §3.2, §3.
Interpretable convolutional neural networks. See DBLP:conf/cvpr/2018, pp. 8827–8836. External Links: Cited by: §1.