1 Introduction
In recent years, many attempts to explain decisions of deep learning models have been conducted, which resulted in various explanation methods called
explainers (Lipton2016IML; Murdoch22071). A common technique used by many explainers (trumbelj2013ExplainingPM; Marco2016; Scott2017) is first to generate some perturbations in the input’s space, then forward them through the model and later provide an explanation based on the captured outputs. For that reason, these methods are also known as perturbationbased explainers.Even though the perturbationgenerating step has a strong influence on the performance of explainers (Marco2016; Scott2017), very few works closely examined this step. Current perturbation schemes often ignore the data topology and distort it significantly as a result. These distortions can considerably degrade explainers’ performance since models are not trained to operate on the deformed topology. Additionally, the difference between the perturbations and the original data creates opportunities for malicious intents. For example, the work (FoolingLIMESHAP) demonstrates that a discriminator trained to recognize the explainer’s perturbations can be exploited to fool the explainer.
Motivated by that lack of study, our work aims to redesign the perturbation step in an explanation process so that the topological structure of the original data is better preserved. Our key result is that, assuming the input data is embedded in an affine subspace whose dimension is significantly smaller than that of the data dimension, eliminating the perturbations’ components along that affine subspace would better preserve the topological integrity of the original manifold. An illustration of that result is provided in Fig. 1, which shows that perturbation along the orthogonal directions (i.e. no subspace’s directions) results in smaller distortion in the topological structure of the original data, which is reflected in the smaller Bottleneck distances in dimension 0 and 1, denoted by and .
Based on that result, we further propose a novel manifoldbased perturbation method aiming to preserve the topological structure of the original data, called EMaP. The highlevel operations of EMaP are shown in Fig. 2
. Given some sampled data, EMaP first learns a function mapping the samples to their lowdimensional representations in the data subspace. Then, that function is used to approximate a local affine subspace, shortened to localsubspace, containing the data in the neighborhood of the input to be explained. Finally, the EMaP perturbations are generated by adding the noise vectors that are orthogonal to that localsubspace to the data.
Contributions. (a) We theoretically show that the worstcase discrete GromovHausdorff distance between the data and the perturbations along the manifold’s directions is larger than that along the orthogonal directions. (b) The worstcase analysis suggests that eliminating perturbation’s components along the manifold’s directions can generally better maintain the topological integrity of the original manifold, i.e. the averagecase. We then provide synthetic and realworld experiments based on persistent homology and Bottleneck distance to support that claim. (c) We propose EMaP, an algorithm generating perturbations along the manifold’s orthogonal directions for explainers. EMaP first approximates the input’s manifold locally at some given data points, called pivots, and the explained data point. The perturbations are then generated along the orthogonal directions of these localsubspaces. EMaP also computes the lowdimensional distances from the perturbations to the explained data point so that the explainers can better examine the model. (d) Finally, we provide experiments on four text datasets, two tabular datasets, and two image datasets, showing that EMaP can improve the explainer’s performance and protect explainers from adversarial discriminators.
Organization.
The remainder of the paper is structured as follows. Sections 2 and 3 briefly discuss related work and preliminaries. Section 4 presents our analysis of the discrete GromovHausdorff distances of different perturbation directions, which suggests orthogonal directions are preferable. We strengthen that result with a persistent homology analysis in Section 5. Sections 6 and 7 describe our proposed EMaP algorithm and its experimental results. Section 8 concludes the paper.
2 Related work
This work intersects several emerging research fields, including explainers and their attack/defense techniques. Our approach also uses recent results in topological data analysis. We provide an overview of those related work below.
Perturbationbased explanation methods. Perturbationbased explainers are becoming more popular among explanation methods for blackbox models since they hardly require the any knowledge on the explained model. Notable ones are LIME (Marco2016), SHAP (Scott2017), and some others (trumbelj2013ExplainingPM; Zeiler2014; Mukund2017; chang2018explaining; schwab2019cxplain; Lundberg2020). While they share the same goal to explain the model’s predictions, they are not only different in the objectives but also in their perturbation schemes: some zero out features (Zeiler2014; schwab2019cxplain) or replace features with neutral values (Marco2016; Mukund2017), others marginalize over some distributions on the dataset (Marco2016; Scott2017; Lundberg2020). There also exist methods relying on separate models to generate perturbations (trumbelj2013ExplainingPM; chang2018explaining). The work (Covert2021) provides a comprehensive survey on those perturbationbased explanation methods and how they perturb the data.
Adversarial attack on explainers. We focus on the attack framework (FoolingLIMESHAP), in which the adversary intentionally hides a biased model from the explainer by training a discriminator recognizing its query. The framework will be discussed in details in Section. 3. There are other emerging attacks on explainers focusing on modifying the model’s weights and tampering with the input data (Ghorbani_Abid_Zou_2019; Dombrowski19; Heo2019FoolingNN; Dimanov2020YouST).
Defense techniques for perturbationbased explainers. Since most attacks on perturbationbased explainers were only developed recently, defense techniques against them are quite limited. Existing defenses generate perturbations either from carefully sampling the training data (Joymallya) or from learning some generative models (Saito2020ImprovingLR; Domen). The advantage of EMaP is that it does not require any generative model, which not only reduces the attack surface but also allows theoretical study on the perturbations.
Topological Data Analysis.
Topological data analysis (TDA) is an emerging field in mathematics, applying the techniques of topology (which was traditionally very theoretical) to realworld problems. Notable applications are data science, robotics, and neuroscience. TDA uses deep and powerful mathematical tools in algebraic topology to explore topological structures in data and to provide insights that normal metricbased methods fail to discern. The most common tool in the TDA arsenal is persistent homology, developed in the early 2000s by Gunnar Carlsson and his collaborators. We refer readers to
(ghrist2014elementary; edelsbrunner2010computational) for an overview of both persistent homology and TDA as a whole.3 Preliminaries
We use the standard setting of the learning tasks where the set of input is sampled from a distribution on . is also assumed to be in a manifold embedded in an affine subspace , where is much smaller than
. We also consider a blackbox classifier
mapping each input to a prediction , a local explainer , a (adversarial) discriminator , and a maskingmodel .Explainers. An explanation of prediction can be obtained by running an explainer on and . We denote such explanation by . In additive feature attribution methods, the range of is a set of features’ importance scores. We focus our analysis in the class of perturbationbased explanation, i.e. the importance scores are computed based on the model’s predictions of some perturbations of the input data. The perturbations are commonly generated by perturbing some samples in . We denote the perturbations by , where specifies the amount of perturbation. A more rigorous definition for this notation will be provided in Section 4.
Our experiments are mainly on the LIME explainer (Marco2016) because of its nice formulation, popularity, and flexibility. The output of LIME is typically a linear model whose coefficients are the importance score of the features:
where is the searching space, is the weight function measuring the similarity between and the explained input ,
is the loss function measuring the difference between
and the linear approximation , and is a function measuring the complexity of .Attack framework. We study the discriminatorbased attack framework introduced by (FoolingLIMESHAP), which is illustrated in Fig. 3. In the framework, there is an adversary with an incentive to deploy a biasedmodel . This adversary can bypass detection of the explainer by forwarding the explainer’s perturbations in to a masking model . The decision whether to forward the inputs to the masking model is made by a discriminator . Thus, the success of the attack is determined by the capability to distinguish from of the discriminator . Intuitively, if the explainer can craft an similar to , it not only improves the explainer’s performance but also prevents the adversary from hiding its bias.
4 Analysis of Discrete GromovHausdorff distances of perturbations
We consider the following perturbation problem: Given a manifold embedded in , how do we perturb it so that we preserve as much topological information as possible? More concretely, given a finite set of points sampled from such a manifold, is there a consistent method to perturb the original dataset while preserving some notion of topology?
To begin talking about differences between (metric) spaces, we need to introduce a notion of distance between them. One such commonly used distance is the GromovHausdorff distance. Intuitively, a small GromovHausdorff distance means that the two spaces are very similar as metric spaces. Thus, we can focus our study on the GromovHausdorff distances between the data and different perturbation schemes. However, as it is infeasible to compute the distance in practice, we instead study an approximation of it, which is the discrete GromovHausdorff distance. Specifically, we show that, when the perturbation is significantly small, the worstcase discrete GromovHausdorff distance resulted from orthogonal perturbation is smaller than that of projection perturbation, i.e. perturbation along the manifold (Theorem 4). The proof of that claim relies on Lemma 4, which states that, with a small perturbation, the discrete GromovHausdorff distance between the original point cloud and the perturbation point cloud equals to the largest change in the distances of any pair of points in the original point cloud. With the Lemma, the problem of comparing point clouds is further reduced to the problem of comparing the change in distances.
We now state the formal definitions. Let be a metric space. For a subset and a point , the distance between and is given by .
[Hausdorff distance](tuzhilin2016invented) Let and be two nonempty subsets of a metric space . The Hausdorff distance between and , denoted by is:
[GromovHausdorff distance](tuzhilin2016invented) Let be two compact metric spaces. The GromovHausdorff distance between and is given by:
where the infimum is taken over all metric spaces and all isometric embeddings , .
Even though the GromovHausdorff distance is mathematically desirable, it is practically noncomputable since the above infimum is taken over all possible metric spaces. In particular, this includes the computation of the GromovHausdorff distance between any two point clouds. In 2004, Memoli and Sapiro (memoli:compare) addressed this problem by using a discrete approximation of GromovHausdorff, which looks at the distortion of pairwise distances over all possible matchings between the two point clouds. Formally, given two finite sets of points and in a metric space , the discrete GromovHausdorff distance between and is given by
(1) 
where is the set of all permutations.
Let be a point cloud contained in some affine subspace of . We say is generic if the pairwise distances between the points in are not all equal, i.e. there exist some points such that .
Let be a finite set of points in s.t. for every , there exists a unique such that . realizes a perturbation of with the radius of perturbation being equal to . We also denote (resp. ) as a finite set of points in such that for every , there exists a unique such that and (resp. ), where denotes the line connecting the points and . We are now ready to state the following key theorem:
Given a generic pointcloud , there exists an such that for any and for any instances of , there exists an such that:
We prove Theorem 4 by showing the following lemma:
There exists an such that for any , we have
for any .
To prove Lemma 4, we show that, for a small enough , the optimal permutation in Eq. (1) is the identity . Thus, the minimization in the computation of can be eliminated. The detail is shown in the following.
of Lemma 4. Given a permutation and two point clouds of the same cardinality, denote:
Let be the set of permutations such that . Let . Since is generic, does not include all of and .
Let . We claim that choosing proves the lemma. To be more precise, the radius of perturbation is chosen such that , i.e. .
Given an , for any , we consider two cases:

If , then . Note that the identity permutation belongs to as:
(2) Since , from Triangle inequality, we have:
Therefore, for all , we have:
(3) This implies for .

If , without loss of generality, we assume the pair maximizes . For convenience, we denote and . From the fact that , we have . On the other hand, from (3), . Thus, from Triangle inequality, we obtain:
Since , we establish for .
From the above analysis, we can conclude for all and . Combining this with (2), we have that the identity permutation is the solution of (1), which proves the Lemma.
With the Lemma, Theorem 4 can be proved by choosing a specific projection perturbation such that its discrete GromovHausdorff distance is always bigger than the upper bound for such distance of any orthogonal perturbations. The proof of the Theorem is shown below:
of Theorem 4. Applying Lemma 4 to the orthogonal perturbation and projection perturbation , and for any less than or equal to the minimum of the corresponding to each perturbation specified in Lemma 4, we obtain:
From the triangle inequality (similar to how we show Eq. (3) in the proof of Lemma 4), we have:
where the first inequality is strict due to the orthogonality of the perturbation.
Given such , consider the following perturbation where are collinear and , and for . The distance between such and is greater than or equal to , which proves our claim.
This theoretical result suggests the following perturbation scheme: Given a manifold embedded in an affine subspace of and a fixed amplitude of perturbation, perturbing the manifold in the orthogonal directions with respect to the affine subspace is preferable to random perturbation, as it minimizes the topological difference between the perturbed manifold and the original.
5 Persistent homology analysis with the Bottleneck distance
The results from the previous section show that on the worstcase basis, the orthogonal perturbation is preferable to the projection perturbation. However, when we apply them to actual datasets, how do they compare on average? Since the discrete GromovHausdorff distance is still computationally intractable for a MonteCarlo analysis, we choose a different approach: persistent homology.
For the last 30 years, there have been new developments in the field of algebraic topology, which was classically very abstract and theoretical, toward realworld applications. The newly discovered field, commonly referred to as applied topology or topological data analysis, is centered around a concept called persistent homology. Interested readers can refer to (ghrist2014elementary; edelsbrunner2010computational; ghrist2008barcode) for an overview of the subject.
For any topological space, homology is a topological invariant that counts the number of holes or voids in the space. Intuitively, given any space, the homology counts the number of connected components, the homology counts the number of loops, the homology counts the number of 2dimensional voids, and so on. The homology groups of dimension are denoted by .
Given a point cloud sampled from a manifold, we want to recapture the homological features of the original manifold from these discrete points. The idea is to construct a sequence of topological spaces along some timeline and track the evolution of the topological features across time. The longer the features persist (and hence the name persistent homology), the more likely they are the actual features of the original manifold. Given a point cloud and a dimension , the persistence diagram is the set of points corresponding to the birth and death time of these features in the aforementioned timeline.
For two point clouds, and in particular for their two persistence diagrams, there are several notions of distances between them, representing how similar they are as topological spaces. The most commonly used distance in practice is the Bottleneck distance. [Bottleneck distance] Let and be two persistence diagrams. The Bottleneck distance is given by
where the infimum is taken over all matchings (which allows matchings to points with equal birth and death time).
For simplicity, we shorthand the Bottleneck distance between persistence diagrams of and in dimension to instead of . As the notation takes in 2 parameters in and , this is not to be confused with the homology group of the specified spaces. Note that two point clouds with small Bottleneck distance can be considered topologically similar.
The bottleneck distance is highly correlated to the GromovHausdorff distance, as the bottleneck distance of two persistence diagrams of the same dimension is bounded above by their GromovHausdorff distance (chazal2009signatures):
for every dimension .
Dataset  Parameter  No. points  Data’s noise  Perturbation  No. runs  EMaP’s dim 
Line  Length: 10  100  0.1  100  1  
Circle  Radius: 1  400  0.1  100  2  
2 intersecting circles  Radius: 1  400  0.01  100  2  
2 concentric circles  Radius: 1  400  0.01  100  2  
Spiral  Radius: [0,2]  1000  0.02  100  2 
Experiment  No. data points  No. feats  Feature’s values  No. runs  EMaP’s dim 
COMPAS  7214  100  {0,1}  100  2 
German Credit  1000  28  {0,1}  100  2 
CC  2215  100  {0,1}  100  2 
MNIST  60000  [0,1]  100  2 and 3  
FashionMNIST  60000  [0,1]  100  2 and 3 
MonteCarlo simulations. The Bottleneck distance is much more calculable than the GromovHausdorff distance, and there are available software packages depending on the use cases. As such, we run MonteCarlo simulations to compute the Bottleneck distances on 5 synthetic datasets, 3 realworld tabular datasets and 2 realworld image datasets to confirm our hypothesis that the orthogonal perturbation preserves the topology better than the projection perturbation on average. The synthetic datasets are some noisy point clouds of certain 2dimensional shapes in 3dimensional space. The tabular datasets are the COMPAS (compassdataset), German Credit (germancredit), and Communities and Crime (communitycrime). The image datasets are MNIST (lecun2010) and FashionMNIST (Xiao2017FashionMNISTAN). Table 1 and 2 provide more details about those datasets. We use the Ripser Python library (ctralie2018ripser) to compute the Bottleneck distances in our experiments. All reported Bottleneck distances are normalize with the noise added to the point clouds for more intuitive visualization.






Table 3 reports the means and Bottleneck distances of the perturbations on the synthetic datasets. The number of data points and the noise’s level are chosen mainly for nice visualizations (shown in Appendix A). The results show that orthogonal perturbation consistently results in lower distances for lineshaped dataset and lower distances for cycleshaped datasets. Note that in general, is the better topological indicator for cycleshaped datasets compared to , since cycles or holes (detected by ) are harder to replicate than connected components (detected by ).
For the realworld dataset, we conduct the experiments with perturbations of different noise levels and report results in Fig. 5 and 5. It can be observed that both and Bottleneck distances of the persistence diagrams of the orthogonal perturbation are significantly smaller than those of the projection perturbation on all experiments.
6 EMaP Algorithm
The ideal perturbations for perturbationbased explainers are those drawn from the data distribution since models are trained to operate on that distribution (Marco2016; Ribeiro2018AnchorsHM). However, most perturbation schemes ignore the data distribution in the process of generating the perturbations (see Sect. 2). Furthermore, predictions on the perturbations do not necessarily hold local information about the explained inputs, which is one main reason for the usage of some distance or kernel functions measuring how similar the perturbations are to the explained input. Note that those distances and kernels are normally functions of other distances such as and in the input space , which might not correctly capture the notion of similarity.
By operating in the lowdimensional manifold, EMaP can overcome those issues. First, if topological similarity implies similarity in the model’s predictions, maintaining the topological structure of the original data should improve the relevance of the model’s predictions on the perturbations. Therefore, explaining the model with orthogonal perturbations, which helps preserve the topology better, should be more beneficial. Furthermore, the manifold provides a natural way to improve the similarity measurement among data points. Fig. 6 shows the issue of similarity measurement based on Euclidean distance in the input space . As the distance ignores the layout of the data, further points on the manifold might result in the same similarity measure. On the other hand, the lowdimensional distances computed on the manifold take into account the layout and can overcome that issue.
Algorithm overview. Given an input to be explained, the output of EMaP are the perturbations along the manifold’s orthogonal directions and their lowdimensional distances to that input. The pseudocode of EMaP is shown in Alg. 1. The first step is to learn an embedding function, i.e. a mapper, transforming the data to the low dimension (line 2). Then, samples from each label are selected and combined with the explained input into a set, called the pivots (line 3 to 7). After that, EMaP generates perturbations along the orthogonal directions of the manifold from each pivot (line 10). The usage of pivots is to provide the explainer a wider range of perturbations for better performance. In the next paragraphs, we will describe those key steps in more details.
The mapper. In EMaP, the mapper is learnt from a manifold approximated by UMAP (mcinnes2018umapsoftware). Since the manifold learnt by UMAP is optimized for global information, the orthogonal directions computed on top of that manifold at local data points are prone to high error. This can degrade the correctness of orthogonal perturbations significantly. To overcome this issue, EMaP learns a localsubspace for each pivot and generates the orthogonal perturbations on top of that subspace. Intuitively, the localsubspace is a local affine approximation of the manifold. Therefore, the resulted orthogonal directions are more finely tuned for the local points. We denote as the matrix characterizing the localsubspace at . Since is a linear approximation of data points near , by denoting as the function embedding input data to the manifold. By denoting , we have and , where are points near and are their embedding in . In our current implementations, the set of pivots contains the explained data point and data points sampled from each class label (see Algorithm 1).
EMaP orthogonal perturbations. The key step of EMaP is the generation of orthogonal perturbations from (line 10, Alg. 1), which can be described by the following equation:
(4) 
where the noise is sampled from a multivariate normal distribution and
is the projection of the noise on the localsubspace characterized by . Upon obtaining the orthogonal perturbations, we can compute their lowdimensional embedding using the mapper transform function . The pseudocode for this computation is in Algorithm 2.Localsubspace approximation. The correctness of the orthogonal perturbations, i.e. whether the perturbations are actually lying in the orthogonal subspace, is heavily dependent on the correctness of the computation of . We now discuss how EMaP learns the localsubspace matrix .
Ideally, given a set of data near in the manifold, we can solve the following optimization for :
(5) 
where is the transform function embedding input data to the manifold learnt in the previous step. An intuition is, for all in the manifold and near , we are searching for a matrix such that the inverse mapping approximately equals its original value . Note that if the embedding is exactly on and the manifold is affine, the optimization (5) can achieve its optimal value for some . Since it is not trivial to obtain the set belonging to the manifold, EMaP perturbs around with some random noise, and solves the following the optimization instead:
(6) 
where is a ball centering at with noise radius . EMaP solves (6) for an approximation of the localsubspace. Further details of this step are provided in Algorithm 3.
We now discuss the gap between the ideal solution and the approximation used by EMaP, i.e. the solution of (5) and (6). This can be characterized by bounding the error between and , i.e. the reconstructed signals in by using and , respectively. Lemma 6 provides a bound on that reconstruction error where the set in (5) is the projection of a ball on the data manifold. The bound holds under a mild assumption that the optimal objective of (5) is not larger than that of (6). We find this assumption reasonable and intuitive: as the set is in a subspace of dimension and the set of is a ball in , finding a subspace of dimension approximating the subspace containing should give a much lower error.
Assume that all data points belong to the same affine space . Let Proj be the projection onto , then under the above assumption on the optimization (5) and (6), the reconstruction error on perturbed data points is upper bounded by:
where is the orthogonal components of and .
For simplicity, we rewrite:
The assumption regarding the objectives (5) and (6) mentioned in the Lemma can be rewritten as:
We find this assumption reasonable since its left hand side is equal to in the ideal scenario, i.e. for all in our dataset. With that, we have:
where the last two inequalities are from the Triangle Inequality. The last equality is due to the fact that is the orthogonal components of and that has no orthogonal components.
Note that is small if the manifold is affine in the neighborhood of and is good, i.e.
is a good estimator for
under the above assumption.7 Experiments
We evaluate EMaP on two main objectives: explainer’s performance and perturbations’ robustness. Our experiments are conducted on 3 tabular datasets, 2 images datasets, and 4 text datasets of reviews in multiple domains. They are COMPAS (compassdataset), Communities and Crime (communitycrime), German Credit (germancredit), MNIST (lecun2010), FashionMNIST (Xiao2017FashionMNISTAN), and 4 reviews datasets in the MultiDomain Sentiment (multibook).
Dataset, models and explainer’s hyperparameters. All reported results for realworld datasets include at least 2000 data points and 100 runs, except for those of German Credit where the data only consists of 1000 samples.
The experimental models for the text dataset is the logistic regression implemented by the LIME paper (Marco2016)
with the groundtruth sets of explanatory features. The model of testing for the two image datasets are 2layer convolutional networks implemented in Pytorch
(pytorch) with test set accuracy of 98% for MNIST and 93% for Fashion MNIST.The model’s inputs of all experiments are normalized between 0 and 1. The noise vector used to generate perturbations has radius for text data and for image data, which are in the range shown in the previous experiments in Fig. 5 and 5. The noise radius used to approximate the localsubspaces (Algorithm 3) is chosen equal to the noise radius for perturbation. The selection of these radii and the lowdimensional value of the manifold depends on the dataset. For UMAP’s hyperparameters, we use their default settings with n_components and min_dist . Our source code is attached in the supplementary material of this submission. Finally, for fair comparison, the number of perturbations used to generate the explanation of any reported methods is 1000.
Technical implementation of EMaP and baselines. Generally, EMaP perturbations can be used to leverage any modelagnostic perturbationbased methods; however, the modification may require significant changes on the existing implementations of the methods. In our experiments, we use EMaP to leverage LIME (Marco2016) as a proof of work, i.e. we show that EMaP can improve LIME in term of performance. We use notation EMaP to indicates LIME with EMaP’s perturbations in our following experimental results. We choose LIME to demonstrate the advantages of EMaP since it requires few changes to integrate EMaP’s perturbations. This helps demonstrate fairly the gain of applying EMaP on explanation methods.
We now explain in more details how we leverage LIME with EMaP. As described in the original paper, LIME’s explanation is the solution of the following optimization:
where is a loss between the explained function and the explanation function , is the class of linear models and is an exponential kernel defined on some distance function (see (Marco2016) for more details). To use EMaP, we set the loss function as:
where with the distance computed as in Algorithm 2 and is the linear function of the changes in each input’s features.
Throughout our experiments, we compare EMaPLIME (or EMaP for short) to LIME with different perturbation schemes. Our goal is to demonstrate the advantage of using EMaP to generate explanation. We also include experimental results of some other blackbox and whitebox explanation methods for comparison. Specifically, we include results of following methods:

LIME zero: LIME with perturbations whose perturbed features are set to zero. This method is used by LIME in explaining text data.

LIME+: LIME with perturbations whose perturbed features are added with Gaussian noise. Ths method is used by LIME in explaining image data.

LIME*: LIME with perturbed whose perturbed features are multiplied with uniform noise between 0 and 1. This can be considered as a smoother version of LIME zero.

KernelSHAP: a blackbox method based on Shapley value (Scott2017), whose perturbed features are set to average of some background data.

GradientSHAP: a whitebox method based on Shapley value (Scott2017), which relies on the gradient of the model.

DeepLIFT: a whitebox method based on backpropagating the model (Avanti2017).
The precision and recall of explanations returned by Greedy, LIME, Parzen and LIMEEMaP (the higher the better). The dots are in the increasing order of the number of features in explanations (left to right).
Explainer’s performance. We first report our experimental result for the sentiment classification task in the MultiDomain Sentiment dataset (multibook). We follow the experimental setup in (Marco2016), in which the groundtruth explanatory features of the logistic regression model are known. The impact of EMaP’s perturbations on the explainer is evaluated by the precision and the recall rate of the features in the explanations. Intuitively, more features in the explanation will increase the recall and decrease the precision.
Fig. 7 shows the scatter plot of precision vs. recall of LIME and LIME with EMaP on 4 review datasets of books, dvds, kitchen and electronics. We also provide the result of the Greedy and Parzen explanation methods (parzen). In Greedy, the features contributing the most to the predicted class are removed until the prediction changes. On the other hand, Parzen approximates the model globally with Parzen windows and the explanation is the gradient of the prediction. The results clearly show that EMaP consistently improves the faithfulness of LIME.
Logodds scores of different perturbationbased methods on MNIST and FashionMNIST (the higher the better).
Since there is no groundtruth explanations for the MNIST and Fashion MNIST image datasets, we evaluate explanations using the logodds scores (Avanti2017) and the infidelity scores (Yeh2019OnT). Given an input image and the importance weights of its features, the logodds score measures the difference between the image and the modified image whose pixels are erased based on their importance weights. In our experiments, the erased pixels are those with the top 20% weights. Intuitively, the higher the logodds score, the better the explanation. On the other hand, the infidelity score measures the expected error between the explanation multiplied by a meaningful perturbation and the differences between the predictions at its input and at the perturbation. The metric can be considered as a generalized notion of Sensitivity (ancona2018towards). Intuitively, explanations with lower infidelity are more desirable.
Fig. 9 shows the logodds scores of EMaP with lowdimension and , along with other explanation methods and other perturbation schemes. In MNIST, we can see that EMaP does not degrade the explainer performance compared to LIME in term of logodds(note that the default setting for LIME in image is LIME+). For Fashion MNIST, EMaP improves the logodds significantly. Fig. 9 shows the infidelity scores. It is clear that EMaP has the lowest infidelity score among all blackbox methods. Even though the white box methods, KernelSHAP and DeepLIFT, have more information on the explained models than EMaP, they can only outperform EMaP in FashionMNIST. Some virtualization of EMaP and other explanation methods are provided in Appendix A.
Perturbation’s robustness. We evaluate the robustness of the perturbation scheme based on the discriminator’s performance in differentiating perturbations from the original data. Following the setup in (FoolingLIMESHAP), the discriminator is trained with a fullknowledge on explainer’s parameters. This discriminator has been shown to be able to recognize perturbations of LIME and SHAP explainers.
Our experimental results show that EMaP’s perturbation is more robust to the discriminator. Specifically, Figs. 13, 13, 13 and 13 show the TruePositive (TP) and TrueNegative (TN) rates of discriminators on perturbations of 2 image datasets and 2 tabular datasets. Note that the TP and TN rates are ideally around since they would indicate that the discriminators cannot recognize both the original data and the perturbations. For image datasets, the discriminators can easily recognize perturbations generated by LIME and SHAP. On the other hand, EMaP perturbations significantly lower the success rates of discriminator in recognizing the perturbations. While the explainers’ perturbation schemes show to be slightly more robust in the tabular datasets compared to the image datasets, EMaP still improves the perturbations’ robustness remarkably.
Computational resource and run time. Our experiments are conducted on a single GPUassisted compute node that is installed with a Linux 64bit operating system. The allocated resources include 32 CPU cores (AMD EPYC 7742 model) with 2 threads per core, and 100GB of RAM. The node is also equipped with 8 GPUs (NVIDIA DGX A100 SuperPod model), with 80GB of memory per GPU.
The run time of EMaP is mostly dictated by the learning of the embedding function (line 2 of Algorithm 1). That initilization step in the tabular dataset for about 2000 data points takes less than 2 minutes. It takes between 240 and 260 seconds for all images of 60000 MNIST/FashionMNIST images. Given the localsubspaces, the generation of 1000 orthogonal perturbations takes about 0.3 second. Note that the manifold and localsubspaces can be computed before deployment since it does not depends on the explained inputs. For a rough comparison, the processing of perturbations by LIME on 1000 perturbations of both image datasets on the 2layer network also takes about 0.3 seconds. Thus, the overhead of EMaP at deployment is reasonable. Table 4 report the actual run time of EMaP and LIME in the image datasets.
EMaP Initialization  LIME  EMaP (d=2)  EMaP (d=3)  
MNIST  240260  0.763  1.311  1.493 
FashionMNIST  240260  0.726  1.502  1.467 
8 Conclusion, limitations and future research
From our theoretical and experimental results, we exploit the data manifold to preserve the topology information of its perturbation. We implement the EMaP to realize the idea and demonstrates its benefits in the explaining task. We recognize the main limitation of EMaP is in its requirement of the lowdimensional representations of the data and the local affine subspaces. For more complex data, computing them correctly can be very challenging. There are several interesting open questions of EMaP that we leave for our future work. For instance, it is important to study the impact of the underlying manifoldlearning algorithm, i.e. the UMAP, on the perturbations and the explanations. It is also interesting to examine the behavior of EMaP in a wider range of explainers and applications.
References
Appendix A Visualizations of synthetic data and explanations generated with EMaP
This Appendix provides some visualizations of our synthetic data (in experiments of Table 3) and explanations generated with or without EMaP. The explanations of EMaP shown in this Appendix are those used in the experiments on Section 7 of the main manuscripts.
In Fig. 14, we visualize the synthetic data of different shapes and their perturbations in three dimensions. We also report and Bottleneck distances between the perturbations and the original data (leftcolumn).
Fig. 15 and 16 compare the actual explanations returned by LIME and LIME with EMaP in the books’ reviews in the MultiDomain Sentiment datasets. We can see that the weights of features included in the explanations are quite similar between the two methods.
Fig 17 shows the explanations of LIME with EMaP for all classes in the MNIST dataset. The red (blue) areas mean that, if the features in those areas remain unchanged (change), the activation of that class will be stronger.