Log In Sign Up

EMaP: Explainable AI with Manifold-based Perturbations

In the last few years, many explanation methods based on the perturbations of input data have been introduced to improve our understanding of decisions made by black-box models. The goal of this work is to introduce a novel perturbation scheme so that more faithful and robust explanations can be obtained. Our study focuses on the impact of perturbing directions on the data topology. We show that perturbing along the orthogonal directions of the input manifold better preserves the data topology, both in the worst-case analysis of the discrete Gromov-Hausdorff distance and in the average-case analysis via persistent homology. From those results, we introduce EMaP algorithm, realizing the orthogonal perturbation scheme. Our experiments show that EMaP not only improves the explainers' performance but also helps them overcome a recently-developed attack against perturbation-based methods.


page 18

page 19

page 27

page 28

page 29


MFPP: Morphological Fragmental Perturbation Pyramid for Black-Box Model Explanations

With the increasing popularity of deep neural networks (DNNs), it has re...

EMAP: Explanation by Minimal Adversarial Perturbation

Modern instance-based model-agnostic explanation methods (LIME, SHAP, L2...

Fooling Explanations in Text Classifiers

State-of-the-art text classification models are becoming increasingly re...

Smoothed Analysis of the Art Gallery Problem

In the Art Gallery Problem we are given a polygon P⊂ [0,L]^2 on n vertic...

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

We introduce a novel geometry-informed irreversible perturbation that ac...

Linking average- and worst-case perturbation robustness via class selectivity and dimensionality

Representational sparsity is known to affect robustness to input perturb...

Manifold Restricted Interventional Shapley Values

Shapley values are model-agnostic methods for explaining model predictio...

1 Introduction

In recent years, many attempts to explain decisions of deep learning models have been conducted, which resulted in various explanation methods called

explainers (Lipton2016IML; Murdoch22071). A common technique used by many explainers (trumbelj2013ExplainingPM; Marco2016; Scott2017) is first to generate some perturbations in the input’s space, then forward them through the model and later provide an explanation based on the captured outputs. For that reason, these methods are also known as perturbation-based explainers.

Even though the perturbation-generating step has a strong influence on the performance of explainers (Marco2016; Scott2017), very few works closely examined this step. Current perturbation schemes often ignore the data topology and distort it significantly as a result. These distortions can considerably degrade explainers’ performance since models are not trained to operate on the deformed topology. Additionally, the difference between the perturbations and the original data creates opportunities for malicious intents. For example, the work (FoolingLIMESHAP) demonstrates that a discriminator trained to recognize the explainer’s perturbations can be exploited to fool the explainer.

Figure 1: Visualization of perturbations with the same magnitude generated from a point cloud of a 2-dimensional spiral. Perturbations along the orthogonal directions of the data subspace (far-right) result in lower topological distortion, i.e. smaller Bottleneck distances and .

Motivated by that lack of study, our work aims to re-design the perturbation step in an explanation process so that the topological structure of the original data is better preserved. Our key result is that, assuming the input data is embedded in an affine subspace whose dimension is significantly smaller than that of the data dimension, eliminating the perturbations’ components along that affine subspace would better preserve the topological integrity of the original manifold. An illustration of that result is provided in Fig. 1, which shows that perturbation along the orthogonal directions (i.e. no subspace’s directions) results in smaller distortion in the topological structure of the original data, which is reflected in the smaller Bottleneck distances in dimension 0 and 1, denoted by and .

Based on that result, we further propose a novel manifold-based perturbation method aiming to preserve the topological structure of the original data, called EMaP. The high-level operations of EMaP are shown in Fig. 2

. Given some sampled data, EMaP first learns a function mapping the samples to their low-dimensional representations in the data subspace. Then, that function is used to approximate a local affine subspace, shortened to local-subspace, containing the data in the neighborhood of the input to be explained. Finally, the EMaP perturbations are generated by adding the noise vectors that are orthogonal to that local-subspace to the data.

Figure 2: EMaP’s perturbation: Assume the data is embedded in a low-dimensional affine subspace (middle figure), EMaP approximates that subspace locally at some given data points and performs perturbation along orthogonal directions of that subspace (right figure).

Contributions. (a) We theoretically show that the worst-case discrete Gromov-Hausdorff distance between the data and the perturbations along the manifold’s directions is larger than that along the orthogonal directions. (b) The worst-case analysis suggests that eliminating perturbation’s components along the manifold’s directions can generally better maintain the topological integrity of the original manifold, i.e. the average-case. We then provide synthetic and real-world experiments based on persistent homology and Bottleneck distance to support that claim. (c) We propose EMaP, an algorithm generating perturbations along the manifold’s orthogonal directions for explainers. EMaP first approximates the input’s manifold locally at some given data points, called pivots, and the explained data point. The perturbations are then generated along the orthogonal directions of these local-subspaces. EMaP also computes the low-dimensional distances from the perturbations to the explained data point so that the explainers can better examine the model. (d) Finally, we provide experiments on four text datasets, two tabular datasets, and two image datasets, showing that EMaP can improve the explainer’s performance and protect explainers from adversarial discriminators.


The remainder of the paper is structured as follows. Sections 2 and 3 briefly discuss related work and preliminaries. Section 4 presents our analysis of the discrete Gromov-Hausdorff distances of different perturbation directions, which suggests orthogonal directions are preferable. We strengthen that result with a persistent homology analysis in Section 5. Sections 6 and 7 describe our proposed EMaP algorithm and its experimental results. Section 8 concludes the paper.

2 Related work

This work intersects several emerging research fields, including explainers and their attack/defense techniques. Our approach also uses recent results in topological data analysis. We provide an overview of those related work below.

Perturbation-based explanation methods. Perturbation-based explainers are becoming more popular among explanation methods for black-box models since they hardly require the any knowledge on the explained model. Notable ones are LIME (Marco2016), SHAP (Scott2017), and some others (trumbelj2013ExplainingPM; Zeiler2014; Mukund2017; chang2018explaining; schwab2019cxplain; Lundberg2020). While they share the same goal to explain the model’s predictions, they are not only different in the objectives but also in their perturbation schemes: some zero out features (Zeiler2014; schwab2019cxplain) or replace features with neutral values (Marco2016; Mukund2017), others marginalize over some distributions on the dataset (Marco2016; Scott2017; Lundberg2020). There also exist methods relying on separate models to generate perturbations (trumbelj2013ExplainingPM; chang2018explaining). The work (Covert2021) provides a comprehensive survey on those perturbation-based explanation methods and how they perturb the data.

Adversarial attack on explainers. We focus on the attack framework (FoolingLIMESHAP), in which the adversary intentionally hides a biased model from the explainer by training a discriminator recognizing its query. The framework will be discussed in details in Section. 3. There are other emerging attacks on explainers focusing on modifying the model’s weights and tampering with the input data (Ghorbani_Abid_Zou_2019; Dombrowski19; Heo2019FoolingNN; Dimanov2020YouST).

Defense techniques for perturbation-based explainers. Since most attacks on perturbation-based explainers were only developed recently, defense techniques against them are quite limited. Existing defenses generate perturbations either from carefully sampling the training data (Joymallya) or from learning some generative models (Saito2020ImprovingLR; Domen). The advantage of EMaP is that it does not require any generative model, which not only reduces the attack surface but also allows theoretical study on the perturbations.

Topological Data Analysis.

Topological data analysis (TDA) is an emerging field in mathematics, applying the techniques of topology (which was traditionally very theoretical) to real-world problems. Notable applications are data science, robotics, and neuroscience. TDA uses deep and powerful mathematical tools in algebraic topology to explore topological structures in data and to provide insights that normal metric-based methods fail to discern. The most common tool in the TDA arsenal is persistent homology, developed in the early 2000s by Gunnar Carlsson and his collaborators. We refer readers to

(ghrist2014elementary; edelsbrunner2010computational) for an overview of both persistent homology and TDA as a whole.

3 Preliminaries

We use the standard setting of the learning tasks where the set of input is sampled from a distribution on . is also assumed to be in a manifold embedded in an affine subspace , where is much smaller than

. We also consider a black-box classifier

mapping each input to a prediction , a local explainer , a (adversarial) discriminator , and a masking-model .

Explainers. An explanation of prediction can be obtained by running an explainer on and . We denote such explanation by . In additive feature attribution methods, the range of is a set of features’ importance scores. We focus our analysis in the class of perturbation-based explanation, i.e. the importance scores are computed based on the model’s predictions of some perturbations of the input data. The perturbations are commonly generated by perturbing some samples in . We denote the perturbations by , where specifies the amount of perturbation. A more rigorous definition for this notation will be provided in Section 4.


Figure 3: The discriminator-based attack framework: By recognizing and forwarding the perturbations generated by an explainer to the masking model , the biased-model can be deployed without detection.

Our experiments are mainly on the LIME explainer (Marco2016) because of its nice formulation, popularity, and flexibility. The output of LIME is typically a linear model whose coefficients are the importance score of the features:

where is the searching space, is the weight function measuring the similarity between and the explained input ,

is the loss function measuring the difference between

and the linear approximation , and is a function measuring the complexity of .

Attack framework. We study the discriminator-based attack framework introduced by (FoolingLIMESHAP), which is illustrated in Fig. 3. In the framework, there is an adversary with an incentive to deploy a biased-model . This adversary can bypass detection of the explainer by forwarding the explainer’s perturbations in to a masking model . The decision whether to forward the inputs to the masking model is made by a discriminator . Thus, the success of the attack is determined by the capability to distinguish from of the discriminator . Intuitively, if the explainer can craft an similar to , it not only improves the explainer’s performance but also prevents the adversary from hiding its bias.

4 Analysis of Discrete Gromov-Hausdorff distances of perturbations

We consider the following perturbation problem: Given a manifold embedded in , how do we perturb it so that we preserve as much topological information as possible? More concretely, given a finite set of points sampled from such a manifold, is there a consistent method to perturb the original dataset while preserving some notion of topology?

To begin talking about differences between (metric) spaces, we need to introduce a notion of distance between them. One such commonly used distance is the Gromov-Hausdorff distance. Intuitively, a small Gromov-Hausdorff distance means that the two spaces are very similar as metric spaces. Thus, we can focus our study on the Gromov-Hausdorff distances between the data and different perturbation schemes. However, as it is infeasible to compute the distance in practice, we instead study an approximation of it, which is the discrete Gromov-Hausdorff distance. Specifically, we show that, when the perturbation is significantly small, the worst-case discrete Gromov-Hausdorff distance resulted from orthogonal perturbation is smaller than that of projection perturbation, i.e. perturbation along the manifold (Theorem 4). The proof of that claim relies on Lemma 4, which states that, with a small perturbation, the discrete Gromov-Hausdorff distance between the original point cloud and the perturbation point cloud equals to the largest change in the distances of any pair of points in the original point cloud. With the Lemma, the problem of comparing point clouds is further reduced to the problem of comparing the change in distances.

We now state the formal definitions. Let be a metric space. For a subset and a point , the distance between and is given by .

[Hausdorff distance](tuzhilin2016invented) Let and be two non-empty subsets of a metric space . The Hausdorff distance between and , denoted by is:

[Gromov-Hausdorff distance](tuzhilin2016invented) Let be two compact metric spaces. The Gromov-Hausdorff distance between and is given by:

where the infimum is taken over all metric spaces and all isometric embeddings , .

Even though the Gromov-Hausdorff distance is mathematically desirable, it is practically non-computable since the above infimum is taken over all possible metric spaces. In particular, this includes the computation of the Gromov-Hausdorff distance between any two point clouds. In 2004, Memoli and Sapiro  (memoli:compare) addressed this problem by using a discrete approximation of Gromov-Hausdorff, which looks at the distortion of pairwise distances over all possible matchings between the two point clouds. Formally, given two finite sets of points and in a metric space , the discrete Gromov-Hausdorff distance between and is given by


where is the set of all -permutations.

Let be a point cloud contained in some affine subspace of . We say is generic if the pairwise distances between the points in are not all equal, i.e. there exist some points such that .

Let be a finite set of points in s.t. for every , there exists a unique such that . realizes a perturbation of with the radius of perturbation being equal to . We also denote (resp. ) as a finite set of points in such that for every , there exists a unique such that and (resp. ), where denotes the line connecting the points and . We are now ready to state the following key theorem:

Given a generic point-cloud , there exists an such that for any and for any instances of , there exists an such that:

We prove Theorem 4 by showing the following lemma:

There exists an such that for any , we have

for any .

To prove Lemma 4, we show that, for a small enough , the optimal permutation in Eq. (1) is the identity . Thus, the minimization in the computation of can be eliminated. The detail is shown in the following.

of Lemma 4. Given a permutation and two point clouds of the same cardinality, denote:

Let be the set of permutations such that . Let . Since is generic, does not include all of and .

Let . We claim that choosing proves the lemma. To be more precise, the radius of perturbation is chosen such that , i.e. .

Given an , for any , we consider two cases:

  1. If , then . Note that the identity permutation belongs to as:


    Since , from Triangle inequality, we have:

    Therefore, for all , we have:


    This implies for .

  2. If , without loss of generality, we assume the pair maximizes . For convenience, we denote and . From the fact that , we have . On the other hand, from (3), . Thus, from Triangle inequality, we obtain:

    Since , we establish for .

From the above analysis, we can conclude for all and . Combining this with (2), we have that the identity permutation is the solution of (1), which proves the Lemma.

With the Lemma, Theorem 4 can be proved by choosing a specific projection perturbation such that its discrete Gromov-Hausdorff distance is always bigger than the upper bound for such distance of any orthogonal perturbations. The proof of the Theorem is shown below:

of Theorem 4. Applying Lemma 4 to the orthogonal perturbation and projection perturbation , and for any less than or equal to the minimum of the corresponding to each perturbation specified in Lemma 4, we obtain:

From the triangle inequality (similar to how we show Eq. (3) in the proof of Lemma 4), we have:

where the first inequality is strict due to the orthogonality of the perturbation.

Given such , consider the following perturbation where are collinear and , and for . The distance between such and is greater than or equal to , which proves our claim.

This theoretical result suggests the following perturbation scheme: Given a manifold embedded in an affine subspace of and a fixed amplitude of perturbation, perturbing the manifold in the orthogonal directions with respect to the affine subspace is preferable to random perturbation, as it minimizes the topological difference between the perturbed manifold and the original.

5 Persistent homology analysis with the Bottleneck distance

The results from the previous section show that on the worst-case basis, the orthogonal perturbation is preferable to the projection perturbation. However, when we apply them to actual datasets, how do they compare on average? Since the discrete Gromov-Hausdorff distance is still computationally intractable for a Monte-Carlo analysis, we choose a different approach: persistent homology.

For the last 30 years, there have been new developments in the field of algebraic topology, which was classically very abstract and theoretical, toward real-world applications. The newly discovered field, commonly referred to as applied topology or topological data analysis, is centered around a concept called persistent homology. Interested readers can refer to (ghrist2014elementary; edelsbrunner2010computational; ghrist2008barcode) for an overview of the subject.

For any topological space, homology is a topological invariant that counts the number of holes or voids in the space. Intuitively, given any space, the -homology counts the number of connected components, the -homology counts the number of loops, the -homology counts the number of 2-dimensional voids, and so on. The homology groups of dimension are denoted by .

Given a point cloud sampled from a manifold, we want to recapture the homological features of the original manifold from these discrete points. The idea is to construct a sequence of topological spaces along some timeline and track the evolution of the topological features across time. The longer the features persist (and hence the name persistent homology), the more likely they are the actual features of the original manifold. Given a point cloud and a dimension , the persistence diagram is the set of points corresponding to the birth and death time of these features in the aforementioned timeline.

For two point clouds, and in particular for their two persistence diagrams, there are several notions of distances between them, representing how similar they are as topological spaces. The most commonly used distance in practice is the Bottleneck distance. [Bottleneck distance] Let and be two persistence diagrams. The Bottleneck distance is given by

where the infimum is taken over all matchings (which allows matchings to points with equal birth and death time).

For simplicity, we shorthand the Bottleneck distance between persistence diagrams of and in dimension to instead of . As the notation takes in 2 parameters in and , this is not to be confused with the homology group of the specified spaces. Note that two point clouds with small Bottleneck distance can be considered topologically similar.

The bottleneck distance is highly correlated to the Gromov-Hausdorff distance, as the bottleneck distance of two persistence diagrams of the same dimension is bounded above by their Gromov-Hausdorff distance (chazal2009signatures):

for every dimension .

Dataset Parameter No. points Data’s noise Perturbation No. runs EMaP’s dim
Line Length: 10 100 0.1 100 1
Circle Radius: 1 400 0.1 100 2
2 intersecting circles Radius: 1 400 0.01 100 2
2 concentric circles Radius: 1 400 0.01 100 2
Spiral Radius: [0,2] 1000 0.02 100 2
Table 1: The parameters of the synthetic datasets. The perturbation column shows the average radius of the perturbation applied on each data point.
Experiment No. data points No. feats Feature’s values No. runs EMaP’s dim
COMPAS 7214 100 {0,1} 100 2
German Credit 1000 28 {0,1} 100 2
CC 2215 100 {0,1} 100 2
MNIST 60000 [0,1] 100 2 and 3
Fashion-MNIST 60000 [0,1] 100 2 and 3
Table 2: The parameters of the real-world datasets.

Monte-Carlo simulations. The Bottleneck distance is much more calculable than the Gromov-Hausdorff distance, and there are available software packages depending on the use cases. As such, we run Monte-Carlo simulations to compute the Bottleneck distances on 5 synthetic datasets, 3 real-world tabular datasets and 2 real-world image datasets to confirm our hypothesis that the orthogonal perturbation preserves the topology better than the projection perturbation on average. The synthetic datasets are some noisy point clouds of certain 2-dimensional shapes in 3-dimensional space. The tabular datasets are the COMPAS (compassdataset), German Credit (germancredit), and Communities and Crime (communitycrime). The image datasets are MNIST (lecun2010) and Fashion-MNIST (Xiao2017FashionMNISTAN). Table 1 and 2 provide more details about those datasets. We use the Ripser Python library (ctralie2018ripser) to compute the Bottleneck distances in our experiments. All reported Bottleneck distances are normalize with the noise added to the point clouds for more intuitive visualization.

Table 3: The normalized and Bottleneck distances for Gaussian (G), projection (P), and orthogonal (O) perturbations on synthetic datasets. Visualizations of the actual perturbations are provided in Appendix A.

Table 3 reports the means and Bottleneck distances of the perturbations on the synthetic datasets. The number of data points and the noise’s level are chosen mainly for nice visualizations (shown in Appendix A). The results show that orthogonal perturbation consistently results in lower distances for line-shaped dataset and lower distances for cycle-shaped datasets. Note that in general, is the better topological indicator for cycle-shaped datasets compared to , since cycles or holes (detected by ) are harder to replicate than connected components (detected by ).

For the real-world dataset, we conduct the experiments with perturbations of different noise levels and report results in Fig. 5 and 5. It can be observed that both and Bottleneck distances of the persistence diagrams of the orthogonal perturbation are significantly smaller than those of the projection perturbation on all experiments.

Figure 4: The normalized and Bottleneck distances for orthogonal and projection perturbations on 3 real-world datasets at different noise levels. The x-axis shows the average perturbation’s radius applied on each data point (log-scale).
Figure 5: The normalized and Bottleneck distances for orthogonal and projection perturbations on 2 image datasets at different noise levels. The x-axis shows the average perturbation’s radius applied on each data point (log-scale).
Figure 4: The normalized and Bottleneck distances for orthogonal and projection perturbations on 3 real-world datasets at different noise levels. The x-axis shows the average perturbation’s radius applied on each data point (log-scale).

6 EMaP Algorithm

The ideal perturbations for perturbation-based explainers are those drawn from the data distribution since models are trained to operate on that distribution (Marco2016; Ribeiro2018AnchorsHM). However, most perturbation schemes ignore the data distribution in the process of generating the perturbations (see Sect. 2). Furthermore, predictions on the perturbations do not necessarily hold local information about the explained inputs, which is one main reason for the usage of some distance or kernel functions measuring how similar the perturbations are to the explained input. Note that those distances and kernels are normally functions of other distances such as and in the input space , which might not correctly capture the notion of similarity.

By operating in the low-dimensional manifold, EMaP can overcome those issues. First, if topological similarity implies similarity in the model’s predictions, maintaining the topological structure of the original data should improve the relevance of the model’s predictions on the perturbations. Therefore, explaining the model with orthogonal perturbations, which helps preserve the topology better, should be more beneficial. Furthermore, the manifold provides a natural way to improve the similarity measurement among data points. Fig. 6 shows the issue of similarity measurement based on Euclidean distance in the input space . As the distance ignores the layout of the data, further points on the manifold might result in the same similarity measure. On the other hand, the low-dimensional distances computed on the manifold take into account the layout and can overcome that issue.

Figure 6: Euclidean distances computing in the input space might not capture the actual distances between the data points (left). Distances in low-dimensional space can help with the issue (right).

Algorithm overview. Given an input to be explained, the output of EMaP are the perturbations along the manifold’s orthogonal directions and their low-dimensional distances to that input. The pseudo-code of EMaP is shown in Alg. 1. The first step is to learn an embedding function, i.e. a mapper, transforming the data to the low dimension (line 2). Then, samples from each label are selected and combined with the explained input into a set, called the pivots (line 3 to 7). After that, EMaP generates perturbations along the orthogonal directions of the manifold from each pivot (line 10). The usage of pivots is to provide the explainer a wider range of perturbations for better performance. In the next paragraphs, we will describe those key steps in more details.

Input: Data to explain , a subset of training data , number of pivots per labels , number of perturbations per pivot , lower dimension and noise level .
Output: and . contains orthogonal perturbations locally around and points in . contains the low-dimensional distances of points in to ( is the number of unique labels in ).

1:  Initialized an EMaP sampler object .
2:  .mapper Mapper to the manifold of dimension of
3:  .pivots
4:  Include to .pivots
5:  for each class in  do
6:     Include samples of class to .pivots
7:  end for
8:  ,
9:  for each data point in .pivots  do
11:     Include to and include to
12:  end for
13:  return and .
Algorithm 1 EMaP

The mapper. In EMaP, the mapper is learnt from a manifold approximated by UMAP (mcinnes2018umap-software). Since the manifold learnt by UMAP is optimized for global information, the orthogonal directions computed on top of that manifold at local data points are prone to high error. This can degrade the correctness of orthogonal perturbations significantly. To overcome this issue, EMaP learns a local-subspace for each pivot and generates the orthogonal perturbations on top of that subspace. Intuitively, the local-subspace is a local affine approximation of the manifold. Therefore, the resulted orthogonal directions are more finely tuned for the local points. We denote as the matrix characterizing the local-subspace at . Since is a linear approximation of data points near , by denoting as the function embedding input data to the manifold. By denoting , we have and , where are points near and are their embedding in . In our current implementations, the set of pivots contains the explained data point and data points sampled from each class label (see Algorithm 1).

EMaP orthogonal perturbations. The key step of EMaP is the generation of orthogonal perturbations from (line 10, Alg. 1), which can be described by the following equation:


where the noise is sampled from a multivariate normal distribution and

is the projection of the noise on the local-subspace characterized by . Upon obtaining the orthogonal perturbations, we can compute their low-dimensional embedding using the mapper transform function . The pseudocode for this computation is in Algorithm 2.

Input: Input , number of perturbation and noise level .
Output: orthogonal perturbations of and their low-dimension distances to .

1:   self.get_local_subspace()
3:  for do
6:     Include into
7:  end for
10:   distances of each member in to
11:  return and
Algorithm 2 gen_perturbation

Local-subspace approximation. The correctness of the orthogonal perturbations, i.e. whether the perturbations are actually lying in the orthogonal subspace, is heavily dependent on the correctness of the computation of . We now discuss how EMaP learns the local-subspace matrix .

Ideally, given a set of data near in the manifold, we can solve the following optimization for :


where is the transform function embedding input data to the manifold learnt in the previous step. An intuition is, for all in the manifold and near , we are searching for a matrix such that the inverse mapping approximately equals its original value . Note that if the embedding is exactly on and the manifold is affine, the optimization (5) can achieve its optimal value for some . Since it is not trivial to obtain the set belonging to the manifold, EMaP perturbs around with some random noise, and solves the following the optimization instead:


where is a ball centering at with noise radius . EMaP solves (6) for an approximation of the local-subspace. Further details of this step are provided in Algorithm 3.

Input: Data point .
Hyper-parameters: Number of training samples and noise level for training .
Output: Matrix characterize the local-subspace at

2:  for do
5:     Include into
6:  end for
9:  return
Algorithm 3 get_local_subspace

We now discuss the gap between the ideal solution and the approximation used by EMaP, i.e. the solution of (5) and (6). This can be characterized by bounding the error between and , i.e. the reconstructed signals in by using and , respectively. Lemma 6 provides a bound on that reconstruction error where the set in (5) is the projection of a ball on the data manifold. The bound holds under a mild assumption that the optimal objective of (5) is not larger than that of (6). We find this assumption reasonable and intuitive: as the set is in a subspace of dimension and the set of is a ball in , finding a subspace of dimension approximating the subspace containing should give a much lower error.

Assume that all data points belong to the same affine space . Let Proj be the projection onto , then under the above assumption on the optimization (5) and (6), the reconstruction error on perturbed data points is upper bounded by:

where is the orthogonal components of and .

For simplicity, we rewrite:

The assumption regarding the objectives (5) and (6) mentioned in the Lemma can be rewritten as:

We find this assumption reasonable since its left hand side is equal to in the ideal scenario, i.e. for all in our dataset. With that, we have:

where the last two inequalities are from the Triangle Inequality. The last equality is due to the fact that is the orthogonal components of and that has no orthogonal components.

Note that is small if the manifold is affine in the neighborhood of and is good, i.e.

is a good estimator for

under the above assumption.

7 Experiments

We evaluate EMaP on two main objectives: explainer’s performance and perturbations’ robustness. Our experiments are conducted on 3 tabular datasets, 2 images datasets, and 4 text datasets of reviews in multiple domains. They are COMPAS (compassdataset), Communities and Crime (communitycrime), German Credit (germancredit), MNIST (lecun2010), Fashion-MNIST (Xiao2017FashionMNISTAN), and 4 reviews datasets in the Multi-Domain Sentiment (multibook).

Dataset, models and explainer’s hyper-parameters. All reported results for real-world datasets include at least 2000 data points and 100 runs, except for those of German Credit where the data only consists of 1000 samples.

The experimental models for the text dataset is the logistic regression implemented by the LIME paper (Marco2016)

with the ground-truth sets of explanatory features. The model of testing for the two image datasets are 2-layer convolutional networks implemented in Pytorch 

(pytorch) with test set accuracy of 98% for MNIST and 93% for Fashion MNIST.

The model’s inputs of all experiments are normalized between 0 and 1. The noise vector used to generate perturbations has radius for text data and for image data, which are in the range shown in the previous experiments in Fig. 5 and 5. The noise radius used to approximate the local-subspaces (Algorithm 3) is chosen equal to the noise radius for perturbation. The selection of these radii and the low-dimensional value of the manifold depends on the dataset. For UMAP’s hyper-parameters, we use their default settings with n_components and min_dist . Our source code is attached in the supplementary material of this submission. Finally, for fair comparison, the number of perturbations used to generate the explanation of any reported methods is 1000.

Technical implementation of EMaP and baselines. Generally, EMaP perturbations can be used to leverage any model-agnostic perturbation-based methods; however, the modification may require significant changes on the existing implementations of the methods. In our experiments, we use EMaP to leverage LIME (Marco2016) as a proof of work, i.e. we show that EMaP can improve LIME in term of performance. We use notation EMaP to indicates LIME with EMaP’s perturbations in our following experimental results. We choose LIME to demonstrate the advantages of EMaP since it requires few changes to integrate EMaP’s perturbations. This helps demonstrate fairly the gain of applying EMaP on explanation methods.

We now explain in more details how we leverage LIME with EMaP. As described in the original paper, LIME’s explanation is the solution of the following optimization:

where is a loss between the explained function and the explanation function , is the class of linear models and is an exponential kernel defined on some distance function (see (Marco2016) for more details). To use EMaP, we set the loss function as:

where with the distance computed as in Algorithm 2 and is the linear function of the changes in each input’s features.

Throughout our experiments, we compare EMaP-LIME (or EMaP for short) to LIME with different perturbation schemes. Our goal is to demonstrate the advantage of using EMaP to generate explanation. We also include experimental results of some other black-box and white-box explanation methods for comparison. Specifically, we include results of following methods:

  • LIME zero: LIME with perturbations whose perturbed features are set to zero. This method is used by LIME in explaining text data.

  • LIME+: LIME with perturbations whose perturbed features are added with Gaussian noise. Ths method is used by LIME in explaining image data.

  • LIME*: LIME with perturbed whose perturbed features are multiplied with uniform noise between 0 and 1. This can be considered as a smoother version of LIME zero.

  • KernelSHAP: a black-box method based on Shapley value (Scott2017), whose perturbed features are set to average of some background data.

  • GradientSHAP: a white-box method based on Shapley value (Scott2017), which relies on the gradient of the model.

  • DeepLIFT: a white-box method based on back-propagating the model (Avanti2017).

Figure 7:

The precision and recall of explanations returned by Greedy, LIME, Parzen and LIME-EMaP (the higher the better). The dots are in the increasing order of the number of features in explanations (left to right).

Explainer’s performance. We first report our experimental result for the sentiment classification task in the Multi-Domain Sentiment dataset (multibook). We follow the experimental setup in (Marco2016), in which the ground-truth explanatory features of the logistic regression model are known. The impact of EMaP’s perturbations on the explainer is evaluated by the precision and the recall rate of the features in the explanations. Intuitively, more features in the explanation will increase the recall and decrease the precision.

Fig. 7 shows the scatter plot of precision vs. recall of LIME and LIME with EMaP on 4 review datasets of books, dvds, kitchen and electronics. We also provide the result of the Greedy and Parzen explanation methods (parzen). In Greedy, the features contributing the most to the predicted class are removed until the prediction changes. On the other hand, Parzen approximates the model globally with Parzen windows and the explanation is the gradient of the prediction. The results clearly show that EMaP consistently improves the faithfulness of LIME.

Figure 8:

Log-odds scores of different perturbation-based methods on MNIST and Fashion-MNIST (the higher the better).

Figure 9: Infidelity scores of different perturbation-based methods on MNIST and Fashion-MNIST (the lower the better).
Figure 8: Log-odds scores of different perturbation-based methods on MNIST and Fashion-MNIST (the higher the better).

Since there is no ground-truth explanations for the MNIST and Fashion MNIST image datasets, we evaluate explanations using the log-odds scores (Avanti2017) and the infidelity scores (Yeh2019OnT). Given an input image and the importance weights of its features, the log-odds score measures the difference between the image and the modified image whose pixels are erased based on their importance weights. In our experiments, the erased pixels are those with the top 20% weights. Intuitively, the higher the log-odds score, the better the explanation. On the other hand, the infidelity score measures the expected error between the explanation multiplied by a meaningful perturbation and the differences between the predictions at its input and at the perturbation. The metric can be considered as a generalized notion of Sensitivity- (ancona2018towards). Intuitively, explanations with lower infidelity are more desirable.

Fig. 9 shows the log-odds scores of EMaP with low-dimension and , along with other explanation methods and other perturbation schemes. In MNIST, we can see that EMaP does not degrade the explainer performance compared to LIME in term of log-odds(note that the default setting for LIME in image is LIME+). For Fashion MNIST, EMaP improves the log-odds significantly. Fig. 9 shows the infidelity scores. It is clear that EMaP has the lowest infidelity score among all black-box methods. Even though the white box methods, KernelSHAP and DeepLIFT, have more information on the explained models than EMaP, they can only outperform EMaP in Fashion-MNIST. Some virtualization of EMaP and other explanation methods are provided in Appendix A.

Figure 11: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on Fashion-MNIST dataset (the lower the better).
Figure 12: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on Communities and Crime dataset (the lower the better).
Figure 10: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on MNIST dataset (the lower the better).
Figure 11: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on Fashion-MNIST dataset (the lower the better).
Figure 12: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on Communities and Crime dataset (the lower the better).
Figure 13: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on German Credit dataset (the lower the better).
Figure 10: True-Positive and True-Negative rates of the discriminators on perturbations of different methods on MNIST dataset (the lower the better).

Perturbation’s robustness. We evaluate the robustness of the perturbation scheme based on the discriminator’s performance in differentiating perturbations from the original data. Following the setup in (FoolingLIMESHAP), the discriminator is trained with a full-knowledge on explainer’s parameters. This discriminator has been shown to be able to recognize perturbations of LIME and SHAP explainers.

Our experimental results show that EMaP’s perturbation is more robust to the discriminator. Specifically, Figs. 13, 13, 13 and 13 show the True-Positive (TP) and True-Negative (TN) rates of discriminators on perturbations of 2 image datasets and 2 tabular datasets. Note that the TP and TN rates are ideally around since they would indicate that the discriminators cannot recognize both the original data and the perturbations. For image datasets, the discriminators can easily recognize perturbations generated by LIME and SHAP. On the other hand, EMaP perturbations significantly lower the success rates of discriminator in recognizing the perturbations. While the explainers’ perturbation schemes show to be slightly more robust in the tabular datasets compared to the image datasets, EMaP still improves the perturbations’ robustness remarkably.

Computational resource and run time. Our experiments are conducted on a single GPU-assisted compute node that is installed with a Linux 64-bit operating system. The allocated resources include 32 CPU cores (AMD EPYC 7742 model) with 2 threads per core, and 100GB of RAM. The node is also equipped with 8 GPUs (NVIDIA DGX A100 SuperPod model), with 80GB of memory per GPU.

The run time of EMaP is mostly dictated by the learning of the embedding function (line 2 of Algorithm 1). That initilization step in the tabular dataset for about 2000 data points takes less than 2 minutes. It takes between 240 and 260 seconds for all images of 60000 MNIST/Fashion-MNIST images. Given the local-subspaces, the generation of 1000 orthogonal perturbations takes about 0.3 second. Note that the manifold and local-subspaces can be computed before deployment since it does not depends on the explained inputs. For a rough comparison, the processing of perturbations by LIME on 1000 perturbations of both image datasets on the 2-layer network also takes about 0.3 seconds. Thus, the overhead of EMaP at deployment is reasonable. Table 4 report the actual run time of EMaP and LIME in the image datasets.

EMaP Initialization LIME EMaP (d=2) EMaP (d=3)
MNIST 240-260 0.763 1.311 1.493
Fashion-MNIST 240-260 0.726 1.502 1.467
Table 4: Run time (in seconds) of EMaP compared to LIME. The reported numbers are seconds (for initialization column) and seconds per explanation (for other columns).

8 Conclusion, limitations and future research

From our theoretical and experimental results, we exploit the data manifold to preserve the topology information of its perturbation. We implement the EMaP to realize the idea and demonstrates its benefits in the explaining task. We recognize the main limitation of EMaP is in its requirement of the low-dimensional representations of the data and the local affine subspaces. For more complex data, computing them correctly can be very challenging. There are several interesting open questions of EMaP that we leave for our future work. For instance, it is important to study the impact of the underlying manifold-learning algorithm, i.e. the UMAP, on the perturbations and the explanations. It is also interesting to examine the behavior of EMaP in a wider range of explainers and applications.


Appendix A Visualizations of synthetic data and explanations generated with EMaP

This Appendix provides some visualizations of our synthetic data (in experiments of Table 3) and explanations generated with or without EMaP. The explanations of EMaP shown in this Appendix are those used in the experiments on Section 7 of the main manuscripts.

Figure 14: The visualization of some synthetic data in experiments of Table 3.

In Fig. 14, we visualize the synthetic data of different shapes and their perturbations in three dimensions. We also report and Bottleneck distances between the perturbations and the original data (left-column).

Figure 15: Visualization of EMaP-LIME explanations of the Multi-polarity-books review datasets (multibook).
Figure 16: Visualization of EMaP-LIME explanations of the Multi-polarity-kitchen review datasets (multibook).

Fig. 15 and 16 compare the actual explanations returned by LIME and LIME with EMaP in the books’ reviews in the Multi-Domain Sentiment datasets. We can see that the weights of features included in the explanations are quite similar between the two methods.

Figure 17: Visualization of EMaP-LIME explanations of the MNIST dataset. The column indicates the class label to be explained.

Fig 17 shows the explanations of LIME with EMaP for all classes in the MNIST dataset. The red (blue) areas mean that, if the features in those areas remain unchanged (change), the activation of that class will be stronger.

Figure 18: Examples of explanations in MNIST. Modifying the red-est (blue-est) area would negate (strengthen) the original prediction.
Figure 19: Examples of explanations in Fashion-MNIST. Modifying the red-est (blue-est) area would negate (strengthen) the original prediction.

Fig. 18 and 19 provide some explanations of EMaP along with explanations of other methods in MNIST and Fashion-MNIST,respectively.