. Compared to traditional machine learning methods, deep learning has achieved superior performance in many challenging tasks. There has been an increasing interest in leveraging deep learning methods to aid decision makers in critical domains such as healthcare and criminal justice. However, because of the nested complicated structure, deep learning models remain mostly black boxes, which are extremely weak in explaining the reasoning process and prediction results. This makes it challenging for decision makers to understand and trust their functionality. Therefore, the explainability and transparency of deep learning models are essential to ensure their broad applications in various vertical domains.
Recently, the development of techniques on explainability and transparency of deep learning models has recently received much attention in the research community [6, 11, 15]. Among them, the post-hoc techniques for explaining black-box models in a human-understandable manner have received much attention [10, 3, 9]
, which generate perturbed samples of a given instance in the feature space and observe the effect of these perturbed samples on the output of the black-box classifier. Due to the generality, these techniques have been used to explain neural networks and complex ensemble models in various domains ranging from medicine, law and finance . The most representative system in this category is LIME
. As LIME assumes the local area of the classification boundary near the input instance is linear, it uses a linear regression model which is self-explanatory to locally represent the decision and pinpoint important features based on the regression coefficients. It is found relevant works[5, 13, 2]
proposed to use other models such as decision tree to approximate the target detection boundaries.
There are two drawbacks in current existing local explanations such as LIME. Perturbed samples are generated from a uniform distribution, ignoring the intrinsic correlation between features. This may lead to lose much useful information to learn the local explanation models. Proper sampling operation is especially essential in natural language processing and image recognition. Moreover, most existing methods assume the decision boundary is local linearity, which may produce serious errors as in most complex networks, the local decision boundary is non-linear.
In this paper, we design and develop a novel, high-interpretability and high-fidelity local explanation method to address the above challenges. First, we design a unique local sampling process which incorporate the feature clustering method to handle the feature dependency problems. Then, we adopt Support Vector Regression (SVR) with a kernel function to approximate locally nonlinear boundary. In this way, by simultaneously preserving feature dependency and local non-linearity, our method produces high-interpretability and high-fidelity explanation. For convenience, we refer to our method as LEDSNA “Local Explanation using feature Dependency Sampling and Nonlinear Approximation”.
In this section, we first introduce the two core characteristics of the local explanation method: interpretability and fidelity. Then we introduce the feature sampling with intrinsic dependency and nonlinear boundary of local decision. Finally, we present the framework of LEDSNA algorithm.
An explainable model with good interpretability should be faithful to the original model, understandable to the observer, and graspable in a short time so that the end-user can make wise decisions. Local explanation method learns a model from a set of data samples which is sampled around the instance being explained. The dissimilarity between the true label and predicted label is defined as the loss functionwhich is a measure of how unfaithful is in approximating . In order to ensure both local fidelity and understandability, we add regularization term to loss function:
The regularisation term is a measure of complexity of the explainable model . The smaller the regularisation term is, the better the sparsity of model , which leads to better understandability. This is the general framework of LIME .
Ii-a Feature Sampling with Intrinsic Dependency
In current existing local explanations, the original sampling procedure is made on each feature independently, ignoring the intrinsic correlation between features. Proper sampling operation is essential as the independent sampling process may lead to lose much useful information to learn the local explanation models. In some cases, when most uniformly generated samples are unrealistic about the actual distribution, false information contributors lead to poorly fitting of the local explanation model. In this section, we design an unique local sampling process which incorporate the feature clustering method to activate a subset of features for better local exploration.
Ii-A1 Feature Dependency Sampling for Image
Proper sampling operation is especially essential in natural image recognition because the visual features of natural objects exhibit a strong correlation in the spacial neighborhood. For image classification, we adopt a superpixel based interpretable representation. Each superpixel segment is the primary processing unit, which is a group of connected pixels with similar colors or gray levels. We denote be the original representation of an image, and binary vector be its interpretable representation, which indicating the presence or absence of a superpixel segment. There is the number of pixels and is the number of superpixel. For the images, especially natural images, superpixel segments often correspond to the coherent regions of visual objects, showing strong correlation in a spacial neighborhood. In order to learn the local behavior of image classifier , we generate a group of perturbed samples of a given instance, , by activating a subset of superpixels in . Firstly, we convert the superpixel segments into an undirected graph. the superpixel segments are represented as vertices of a graph whose edges connect to only those adjacent segments. Considering a graph , where and are the sets of vertices and undirected edges, with cardinalities and , a subset of can be represented by a binary vector , where indicates that vertice is in the subset. The perturbed sampling operation is formalized as finding the clique (), where every two vertices are adjacent. We use the Depth-First Search (DFS) method to get the clique . Some samples in the clique are shown in Fig. 2. Since there is a strong correlation between the adjacent superpixel image segments, the clique set construction can take into full account the various types of neighborhood correlation.
Ii-A2 Feature Dependency Sampling for Text
It is also essential for natural language processing to have a proper sampling operation. For text classification, we let the interpretable representation be a bag of words. Similar to image, denotes the original representation of a text, and binary vector denotes its interpretable representation. In order to learn the local behavior of text classifier, we generate a group of perturbed samples of a given instance by activating a subset of features. Fig. 5 shows two natural language in Chinese and English, we can find there are strong semantic dependency between words especially in Chinese. If the activated features are get by using a sampling process where features are independent to each other, we may loss much useful information to learn the local explanation models. In sampling process, the semantic dependent words correspond to adjacent superpixels in the image. Semantic dependent words should be selected or unselected at the same time. There are many methods to analyze semantic dependency of natural language. There, we incorporate the Stanford CoreNLP  tools into sampling process to get the perturbed samples.
Ii-B Nonlinear Boundary of Local Decision
Most existing local explanation methods assume the decision boundary is local linearity. Those explanation methods may produce serious errors as in most complex networks, the local decision boundary is non-linear. Experiments show a simple linear approximation will significantly degrade the explanation fidelity. In this section, we adopt Support Vector Regressor (SVR) with kernel function to approximate nonlinear boundary. In approximation processing, when data are not distributed linearly in the current feature space, we use kernel function to project data points into higher dimensional feature space and find the optimal hyperplane.
The perturbed samples of a given instance are impossible to be fitted by a linear model. Our way to tackle this problem is to apply a kernel function mapping to bring data to a higher dimensional feature space. The formula to transfrom the data is as follow:
After project data point into higher dimensional feature space. We search for a hyperplane by using hinge error measure. Specifically, we introduce slack variables for data points that violate insensitive error:
For each data point , two slack variables, are required to measure whether is above or below the tube.
The learning is by the optimization:
This is the famous support vector regression method which can be solved by building Lagrangian functions.
Algorithm 1 shows a simplified workflow diagram of LEDSNA. Firstly, LEDSNA incorporates the feature clustering method into sampling process to activate a subset of features. Then, LEDSNA uses kernel function to project data points into higher dimensional feature space. Finally, LEDSNA use the support vector regression to search for a hyperplane and get the coefficient of important feature.
In this section, we first introduce the evaluation criterion of explanation methods. Then, we perform experiments in natural language processing in Chinese. Finally, we perform experiment to explain the Google’s pre-trained Inception neural network 
on imagenet database. Experiment results show the flexibility of LEDSNA.
Iii-a Evaluation criterion
A good explainable model requires same characteristics. One of the essential criterion is interpretability. The explanation must appear as a certain form understandable to the observer, i.e., providing visual explanations which lists most significant features contributed to the prediction.
Another essential criterion is local fidelity. The explanation must be faithful to the model in the vicinity of the instance being predicted. Local Approximation Error () and R-squared () are two important measurements of the accuracy of our local approximation with respect to the original decision boundary. Local Approximation Error can reflect the prediction accuracy:
where is a single prediction obtained from a target deep learning classifier, is the predicted value by explanation model.
is the “percent of variance explained” by the explanation model. That is to say thatis the fraction by which the variance of the errors is less than the variance of the dependent variable. is calculated by Total Sum of Squares () and Error Sum of Squares ():
where is the label of perturbed sample , obtained from a target deep learning classifier. is the predicted value and is the mean value of . Moreover, can be expressed by Mean Square Error () and Variance () which are familiar to us:
is a relative measure which is conveniently scaled between 0 and 1. The best is . The closer the score is to , the better the performance of fidelity is to explainer.
Iii-B Experiment on Image Classifiers
In this section, LEDSNA and LIME explain image classification predictions made by Google’s pre-trained Inception neural network . Fig. 5 shows two original image to be processed. Fig. 6 and Fig. 7 lists some visual explanations of LEDSNA and LIME: the first row shows the superpixels explanations by LIME (K=1,2,3,4) respectively, the second row shows the superpixels explanations by LEDSNA (K=1,2,3,4) respectively. The explanations highlight the top K superpixel segments, which have the most considerable positive weights towards the predictions. We can see LEDSNA can effectively get the correlation between the adjacent superpixel segments, which provide a better understanding to users.
In addition, Table I lists some instances of the local approximation error and of two algorithm. Comparing to LIME, we can see LEDSNA provides better predictive accuracy than LIME. Besides, of LEDSNA is much bigger than LIME. By comparing the two criterion, we conclude that LEDSNA has better fidelity than LIME. Compared with LIME in term of interpretability and fidelity, LEDSNA has better performance in explaining classification.
Iii-C Experiment on Sentiment Analysis of Text
Iii-C1 Experiment on Chinese Natural Language Databse
Simplified Chinese Text Processing (SnowNLP) is a sentiment analysis tool especially for Chinese natural language. This section we use LEDSNA and LIME to explain the predictions made by SnowNLP on Public Comment Dataset. As there is a strong semantic dependency between words in Chinese, we incorporate the Stanford Word Segmenter into sampling process to get the perturbed samples. In nonlinear approximating, we use Gaussian kernel function to compute the similarity between the data points in a much higher dimensional space.
Fig. 8 and Fig. 9 shows visual explanations of LEDNSA and LIME, we can see the explanation of LEDNSA can offer more useful information than that of LIME. Table II lists the local approximation error and of six instances. Comparing to LIME, we find LEDSNA achieves better performance across the board, and by average a magnitude of local approximation error than LIME. For , similar observation is obtained.
Moreover, we randomly selected 1000 data samples to constitute testing database. For each testing data sample, we use LEDSNA and LIME to explain SnowNLP and compute the and . Results show for LEDSNA, the Err of of test data samples are smaller than LIME. Similarly, the of of test data samples are bigger than LIME. In conclusion, LEDSNA exhibits strong interpretability and fidelity over LIME
There are two drawbacks in current existing local explanations. Perturbed samples are generated from a uniform distribution, ignoring the complicated correlation between features. This may lead to lose much useful information to learn the local explanation models. Moreover, most existing methods assume the decision boundary is local linearity, which may produce serious errors as in most complex networks, the local decision boundary is non-linear.
In this paper, we design and develop a novel, high-fidelity local explanation method to address the above challenges. First, we design a unique local sampling process which incorporate the feature clustering method to handle the feature dependency problems. Then, we adopt SVR to approximate locally nonlinear boundary. In this way, by simultaneously preserving feature dependency and local non-linearity, our method produces high-fidelity and high-interpretability explanation.
-  (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §I.
-  (2018) LEMNA: explaining deep learning based security applications. See DBLP:conf/ccs/2018, pp. 364–379. External Links: Cited by: §I.
-  I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.) (2017) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4-9 december 2017, long beach, ca, USA. Cited by: §I.
-  (2009) The elements of statistical learning. Springer. Note: www.web.stanford.edu/~hastie/ElemStatLearn Cited by: §I.
-  (2016) Interpretable decision sets: A joint framework for description and prediction. See DBLP:conf/kdd/2016, pp. 1675–1684. External Links: Cited by: §I.
-  (2012) Intelligible models for classification and regression. See DBLP:conf/kdd/2012, pp. 150–158. External Links: Cited by: §I.
-  (2014) The stanford corenlp natural language processing toolkit. See DBLP:conf/acl/2014-d, pp. 55–60. External Links: Cited by: §II-A2, §III-C1.
-  (2017-06) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: Cited by: §I.
-  (2016) Model-agnostic interpretability of machine learning. CoRR abs/1606.05386. External Links: Cited by: §I.
-  (2016) Why should I trust you?: explaining the predictions of any classifier. See DBLP:conf/kdd/2016, pp. 1135–1144. External Links: Cited by: §I, §II.
-  (2006) Learning interpretable models (phd thesis). In Technical University of Dortmund, Cited by: §I.
-  (2019) On the interpretability of machine learning-based model for predicting hypertension. BMC Med. Inf. & Decision Making 19 (1), pp. 146:1–146:32. External Links: Cited by: §I.
-  (2019) Explaining the predictions of any image classifier via decision trees. CoRR abs/1911.01058. External Links: Cited by: §I.
-  (2015-06) Going deeper with convolutions. In , Vol. , pp. 1–9. External Links: Cited by: §III-B, §III.
-  (2014) Intriguing properties of neural networks. See DBLP:conf/iclr/2014, External Links: Cited by: §I.
-  (2018) Distill-and-compare: auditing black-box models using transparent model distillation. See DBLP:conf/aies/2018, pp. 303–310. External Links: Cited by: §I.