On Locality of Local Explanation Models

06/24/2021 ∙ by Sahra Ghalebikesabi, et al. ∙ University of Oxford 3

Shapley values provide model agnostic feature attributions for model outcome at a particular instance by simulating feature absence under a global population distribution. The use of a global population can lead to potentially misleading results when local model behaviour is of interest. Hence we consider the formulation of neighbourhood reference distributions that improve the local interpretability of Shapley values. By doing so, we find that the Nadaraya-Watson estimator, a well-studied kernel regressor, can be expressed as a self-normalised importance sampling estimator. Empirically, we observe that Neighbourhood Shapley values identify meaningful sparse feature relevance attributions that provide insight into local model behaviour, complimenting conventional Shapley analysis. They also increase on-manifold explainability and robustness to the construction of adversarial classifiers.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ability to correctly interpret a prediction model is increasingly important as we move to widespread adoption of machine learning methods, in particular within safety critical domains such as health care

(Holzinger et al., 2017; Gade et al., 2020). In this paper, we consider the task of attributing the features of a complex machine learning model , abstracted as a function that predicts a response given a test instance , given only black-box access to the model. We especially focus on model-agnostic local explanation methods and the two most popular representatives of this group of models, namely LIME (Ribeiro et al., 2016) and SHAP (Lundberg and Lee, 2017). As these methods are often described as fitting a local surrogate model to the black box (Roscher et al., 2020), a natural question is: how ‘local’ are local explanation methods?

Figure 1: Attributions at with varying for a reference distribution of and black box

averaged over 10 runs displayed with 95% confidence intervals (see next section for details). While (Tabular) LIME and SHAP assign the same absolute attribution to Feature-1 no matter how large

is, our neighbourhood approach takes its distance to the decision boundary into consideration. A local linear approximation to the black box trained with a Ridge Regressor gives misleading attributions to Feature-1 for .

As a simple motivating example as to why this question matters, consider a black box model given by where denotes the indicator function. When attributing the local feature importance at a test instance , with fixed at 2, we would expect Feature-1 to receive a higher absolute attribution when is closer to the decision boundary at . In Figure 1 we report the results on this example from LIME and SHAP as well as for our proposed ‘Neighbourhood SHAP’ approach. We observe that Neighbourhood SHAP assigns Feature-1 a smaller attribution, the higher the absolute value of is. SHAP and LIME, however, assign Feature-1 an attribution which is constant either side of which illustrates that these methods capture global model behaviour. The figure also shows that training a local linear approximation to the black box (Rasouli and Yu, 2019; Botari et al., 2020) is misleading since Feature-2 receives a significantly positive attribution for , even though Feature-2 contributes clearly negatively to the model outcome whenever .

This motivates the following contributions

  1. We propose Neighbourhood SHAP (Section 3) which considers local reference populations for prediction points as a complimentary approach to SHAP. By doing so, we show that the Nadaraya-Watson estimator at can be interpreted as an importance sampling estimator where the expectation is taken over the proposed neighbourhood. Empirically, we find that greater locality increases the number of model evaluations on the data manifold and with this the robustness of the attributions against adversarial attacks.

  2. We consider how smoothing can also be used to stabilise SHAP values (Section 4). We quantify the loss in information incurred by our smoothing procedure and characterise its Lipschitz continuity.

2 Background

We begin with a short introduction to Shapley values – the quantity of interest of the SHAP optimisation procedure. For a pre-defined value function that takes a set of features as input, the Shapley value of feature measures the expected change in the value function from including feature into a random subset of features (without )

where the expectation is taken over the feature coalitions whose distribution is defined by

. This choice of probability distribution ensures that sampling a set of size

has the same probability as sampling one of size

, for .

The choice of value function for explanation-based modelling of feature attributions at an instance has been the subject of recent debates (Aas et al., 2019; Janzing et al., 2020; Merrick and Taly, 2020). The consensus is to take the expectation of the black box algorithm at observation over the not-included features using a reference distribution such that

for and the operation denoting the concatenation of its two arguments. Marginal Shapley values (Lundberg and Lee, 2017; Janzing et al., 2020) define where denotes the marginal data distribution. Conditional Shapley values (Aas et al., 2019) set the reference distribution equal to the conditional distribution given , . All in all, the Shapley value is characterised by the expected change in model output, comparing the output when we include in the model, i.e. integrate out some randomly sampled features , with the model output where feature is not included, i.e. we integrated out some randomly sampled features including ,

As we see, Shapley values are computed by estimating the change in model outcome when some features are integrated out over the reference distribution , which has so far been defined as either the marginal or conditional global population. For marginal Shapley values, the interpretation simplifies: The Shapley value of feature is the expected change in model outcome when we sample a random individual


from the global statistical population and set its feature equal to (after we already set a random set of features equal to ). This motivates our proposal in Section 3 of neighbourhood distributions where we instead sample a random individual from the immediate neighbourhood of , as outlined in the next section.

Computing Shapely values is challenging in high-dimensional feature spaces, which motivates the widely adopted KernelSHAP approach (Lundberg and Lee, 2017) that estimates the Shapley values of all features by empirical risk minimisation of


where is a linear explanation model with the Shapley values as its coefficients, are i.i.d. draws from the respective global reference distributions, is a set of sampled coalitions, and the weights are defined by the KernelSHAP weights (Lundberg and Lee, 2017). LIME optimises a similar generalised expectation – also sampling references from a global distribution. To improve local fidelity of Tabular LIME, Ribeiro et al. (2016) propose to define the weights as for a bandwidth . While this weighting increases the importance of proportional to the size of , it however does not ensure that higher weights are assigned to model evaluations for observations closer to .

A simple solution to the locality problem is to fit a local linear approximation in the form of a tangent line that predicts the black box in a small neighbourhood around , as in (Rasouli and Yu, 2019; Botari et al., 2020; Visani et al., 2020; White and Garcez, 2019). Such an approach has however several drawbacks compared to SHAP (and thus Neighbourhood SHAP) such as higher instability, less interpretability, and assuming a fixed parametric form. While SHAP (and Neighbourhood SHAP) does not make any assumptions on the form of in the feature space, local linear approximations assume linearity of the black box in a neighbourhood. As a consequence, this may result in misleading attributions, as was demonstrated in Figure 1. See Supplement LABEL:sec:suppl_linear for a detailed discussion of local approximating models versus local reference populations.

3 Neighbourhood SHAP

Shapley values – similarly to other feature removal methods – employ a global reference distribution when computing attributions. This can lead to surprising artefacts as illustrated in Figure 2. To increase the local fidelity of Shapley values, we propose to sample from a well-defined local reference population instead. Having selected a distance metric

, such as the Euclidean distance or the more powerful Random Forests

(Bloniarz et al., 2016), we define a distance-based distribution that is centred around , such as the exponential kernel . Further, we define the local neighbourhood distribution as where can be any marginal or conditional reference distribution and is the normalising constant. This choice ensures that we sample neighbourhood values not only considering the metric space but also the data distribution. This leads to a proposed change to the optimisation problem of eq. (1) to the following Neighbourhood SHAP minimisation

Figure 2: When sampling (black dots) from reference distribution (here ), the Shapley value at is positive since is larger than . In contrast, Neighbourhood SHAP is negative since is larger than . This difference results from the fact that, first, the model outcome has a local minimum at , and second,

takes its smallest values at the tails of the data distribution (right-skewed density of

when , black line on the left). SHAP only captures that is higher than the average model outcome but not that is smaller at than it is for any other close observation – this is reflected by Neighbourhood SHAP.

Instead of estimating the neighbourhood distribution, we approximate the expectation of the model outcome in the neighbourhood around using self-normalised importance sampling (Doucet et al., 2001) with proposal distribution

While our proposal, Neighbourhood SHAP, weights the based on a distance metric to , KernelSHAP uses uniform weights, i.e. . We note that the proposed local neighbourhood sampling scheme has a convenient form which corresponds to the well-known Nadaraya-Watson estimator (Nadaraya, 1964; Watson, 1964; Ruppert and Wand, 1994) used for kernel regression. Kernel regression is a non-parametric technique to model the non-linear relationship between a dependent variable (here, ) and an independent variable (here, ), by approximating the conditional expectation (here, ).

While the form of the Nadaraya-Watson estimator has so far been justified from a kernel theory perspective (Supplement LABEL:sec:suppl_nwe), we show that it can be interpreted as an importance sampling estimator. The Nadaraya-Watson estimator , where is a kernel function, is an unbiased self-normalised importance sampling estimator of with proposal distribution and desired distribution proportional to . As pointed out in Supplement LABEL:sec:suppl_axioms, all Shapley axioms (Lundberg and Lee, 2017; Sundararajan and Najmi, 2020) still hold true for the Neighbourhood SHAP. Now, by linearity, we can quantify the difference between SHAP and Neighbourhood SHAP as ‘Anti-Neighbourhood SHAP’ (see Supplement LABEL:sec:suppl_antinbrh

). Looking at this difference might be of value to characterise the information loss when contrasting an instance to the global population instead of to a local neighbourhood. Finally, we also derive a variance estimator of Shapley values computed with the Shapley formula in Supplement


On-Manifold Explainability.

A major disadvantage of marginal Shapley values and LIME is that the concatenated data vectors

for a sampled reference do not necessarily lie on the data manifold (Frye et al., 2020; Chen et al., 2020). This has two serious ramifications: 1) the model is evaluated in regions that lie off the data manifold where it might behave unexpectedly, and be unrepresentative for the data population; and 2) adversaries can use an out-of-distribution (OOD) classifier trained to distinguish real-data from simulated concatenated data and, through this, construct a model whose Shapley values look fair even though the model is demonstrably unfair on the real-data domain (Slack et al., 2020). To circumvent this problem, Frye et al. (2019); Aas et al. (2019); Covert et al. (2020) propose the use of conditional instead of marginal reference distributions. However, using conditional reference distributions changes the interpretability of Shapley values – i.e. unrelated features get a non-zero attribution – and thus, their use is controversial (Janzing et al., 2020). A marginal Neighbourhood SHAP approach in contrast can achieve on-manifold explainability while keeping the properties of marginal Shapley values for small enough if the data manifold is to some extent coherent (see Figure 3).

Figure 3: Concatenated data (pink dots) used for model evaluations for the computation of KernelSHAP (left) and Neighbourhood SHAP (, right) at a randomly sampled instance (maroon dots) where the data manifold is a ring in . Even though the background references (blue dots) lie on the data manifold, marginal Shapley values are evaluated at instances that lie off the data manifold.

Choice of Bandwidth.

For , Neighbourhood SHAP will be equal to KernelSHAP, while it converges to 0 for . Small neighbourhoods thus induce regularisation in the predictions which we also observe empirically in Section 5. While SHAP values add up to , Neighbourhood SHAP attributions add up to

. Hence, care needs to be taken when comparing SHAP and Neighbourhood SHAP, since the scales might differ. In this case, both SHAP values (standard and neighbourhood) can be divided by either the sum of their absolute values or by their standard deviation, to represent relative attribution measures. As commonly observed with kernel regression approaches, there are some drawbacks, such as the additional hyperparameters (distance function, bandwidth) and increased variability especially in data sparse regions for small bandwidths. These problems can be tackled by choosing adaptive bandwidth methods. For instance,

could be chosen such that the 25% closest observations to are not assigned more than 75% of the weight mass. We propose to plot the Neighbourhood SHAP values of the normalised features over a range of bandwidths, from . This provides a powerful diagnostic and information tool.

The computational burden of changing is not as large as it might first appear. Our importance sampling approach has the desirable property that is estimated on the same set of references for each , and that only the importance weights vary with the bandwidth. As a result, there are no additional model evaluations required when Neighbourhood SHAP is computed for a different . This stands in contrast to other neighbourhood schemes proposed in the XAI literature such as KDEs (Botari et al., 2019), GANs (Saito et al., 2020) or Gaussian perturbations (Robnik-Šikonja and Bohanec, 2018) where the black box must be evaluated an additional times for each new bandwidth where denotes the number of sampled coalitions. Please refer to Supplement LABEL:sec:suppl_comp for a theoretical and empirical complexity analysis.

4 Smoothed SHAP

In the previous section, we discussed neighbourhood sampling as a useful tool to understand feature relevance through feature removal. We have also seen that the proposed neighbourhood sampling approach relates to kernel smoothers such as the Nadaraya-Watson estimator. This result can give us insights to consider a Smoothed SHAP that locally averages neighbouring SHAP values


where are samples from the reference distribution and is a kernel function. Such smoothing procedures have been applied before in the explainability literature, e.g. for gradient-based methods (Smilkov et al., 2017; Yeh et al., 2019), and can be of interest when the interpretability of SHAP values suffers under the high instability of the black box (Alvarez-Melis and Jaakkola, 2018; Ghorbani et al., 2019; Hancox-Li, 2020). The smoothing it induces can be captured by a Lipschitz constant whose upper bound decreases with the bandwidth . For every with , there exists a constant such that for the smoothed Shapley value estimator (2) with if is bounded on where is a function that decreases in with as . With the tools from before, we can derive that Smoothed SHAP is an unbiased importance sampling estimator of the SHAP values from the neighbourhood around


where the new value function is defined by . This smoothed value function relates to the explicit modelling of feature inclusion and gives an interesting perspective on the meaning of smoothing, namely that is a measurement of the test instance variable . Exploring a smoothed summary of the SHAP values in the local neighbourhood around highlights how local variability in drives changes in the SHAP feature attributions. This is interesting in its own right but particularly so if features are susceptible to reporting error. As an illustration, consider a black box algorithm that predicts the fitness level of an adult based on multiple covariates, including weight. The reported weight may be subject to error if unreliable scales are used. In addition, as weight varies constantly throughout the day, the individual might not be interested in the attribution for one particular weight at a single point in time, but rather in the attribution that a range of weights per day receives. The test instance is thus more appropriately described by a test distribution of around where

is a random variable that describes the volatility in the covariates of the test instance. If the test distribution is unknown, it can be estimated by setting it, as earlier, equal to a neighbourhood distribution

where encapsulates the prior belief on the variability of and captures the artefacts of the data distribution (i.e. skew, curtosis, high density regions). The kernel can now be defined with a multivariate bandwidth . We can observe empirically that such a choice can decrease the MSE of the estimation of Shapley values (Supplement LABEL:sec:suppl_exp). Building upon results from kernel regression, we can quantify the squared distance of Smoothed SHAP to (Supplement LABEL:sec:suppl_smooth). Finally, we also derive a variance estimator for Smoothed SHAP in Supplement LABEL:sec:suppl_var.

Choice of Smoothing Bandwidth.

Prior information on the variability of the covariates of the test instance can be included in the definition of the bandwidth matrix. Fixed covariates, like age or season, are not expected to change and thus receive a bandwidth , while volatile features like weight, temperature or windspeed are assigned a positive bandwidth. For bandwidths , the feature is treated as inherently missing. If for all features , Smoothed SHAP equals the average of the Shapley values over all references which is often used as a global explanation measure (Frye et al., 2020; Covert et al., 2020; Bhargava et al., 2020). As Smoothed SHAP can be estimated efficiently once SHAP values have been computed for the reference population, we propose, again, computing it for several bandwidth choices, and using a plot with respect to the bandwidth as a visualisation technique to help inform the choice of bandwidth. The bandwidth induces a bias-variance trade-off as derived in Supplement LABEL:sec:suppl_smooth: the larger the bandwidth, the smoother the results, but also the less Smoothed SHAP reflects the model behaviour at , especially if is highly non-linear.

Connection to LIME.

Tabular LIME (Ribeiro et al., 2016; Garreau and von Luxburg, 2020)

provides the same explanation for any two instances falling into the same quantile along each dimension

(Garreau and von Luxburg, 2020). As such it is also an aggregated attribution measure, similar to Smoothed SHAP. Key differences are the treatment of different dimensions and no proven guarantees of Lipschitz continuity (see Supplement LABEL:sec:suppl_lime).

5 Examples

We present comprehensive experiments on several standardised real-world tabular UCI data sets (Asuncion and Newman, 2007)

of different sizes predicted with ensemble classifiers or regressors, as well as an image classification task on the MNIST dataset. The experiments demonstrate some key attributes of Neighbourhood and Smoothed SHAP including: Neighbourhood SHAP increases on-manifold explainability and robustness against adversarial attacks; Neighbourhood SHAP also leads to sparser attributions than standard Shapley values; Smoothed SHAP tells us how Shapley values of neighbouring observations differ from the attribution of the test instance.

Since Neighbourhood SHAP, Smoothed SHAP and SHAP operate on different scales, we divide all attributions by their standard deviation (over features) unless otherwise specified. We present a subset of our results in this Section and refer the interested reader to Supplement LABEL:sec:suppl_exp for a thorough report of all experimental results (including simulated experiments), details and hyper-parameter settings.

On-Manifold Explainability and Robustness against Adversarial Attacks.

For adversarial learning, we train a Random Forest and a LightGBM as OOD classifiers that distinguish true data from concatenated vectors used for model evaluations. We find that for small bandwidths , the adversary is not able to distinguish between the test data and the concatenated test data (Figure 3(a)), leading to a deterioration in their ability to discriminate true from concatenated vectors. Under the assumption that the classifiers are able to detect the true data manifold, we can thus claim that Neighbourhood SHAP relies more on observations from the data manifold than SHAP and LIME. Further, we mimicked the experimental setup of Slack et al. (2020) on the COMPAS data set (Angwin et al., 2016): an adversarial black box predicts recidivism based only on race if the data is predicted from the OOD classifier to be from the data manifold, and returns an unrelated column if it is not. As presented in Figure 3(b) for 10 randomly sampled individuals, the unrelated column has no effect on Neighbourhood SHAP and race has a higher relative attribution than it does for KernelSHAP.

(a) AUC from OOD LightGBM and RF over 10 runs with 95% CIs. Concatenated data was created by sampling as many coalition vectors as data and masking with random references. Where references are sampled locally (smaller ), OOD classifiers perform significantly worse.
(b) Adversarial black box predicts recidivism using the COMPAS data. Absolute attributions obtained from Neighbourhood SHAP and KernelSHAP are divided by the sum of attributions for comparability. The adversarial attack affects Neighbourhood SHAP (with ) less than KernelSHAP when averaged over 10 runs. Without adversarial attack, (Neighbourhood) SHAP attributes only race (not shown).
Figure 4: Neighbourhood SHAP explains on-manifold and is robust to adversarial attacks.

Increased Local Prediction Accuracy.

As SHAP learns a binary feature model , we can sample feature coalitions and reference values to perturb test data and predict the model outcome at the perturbed data. To check local accuracy, we weight the reference values with an exponential kernel. Its bandwidth signifies the size of the neighbourhood. Figure 5

presents the MSE corresponding to an XGBoost model, applied to four different datasets. As expected, Neighbourhood SHAP with a smaller bandwidth predicts data within a small neighbourhood significantly better than Neighbourhood SHAP with a larger bandwidth. Here we noticed that the difference between the bandwidths is larger where there are fewer features in the data set (such as the

iris dataset). We attribute the loss in performance to the difficulty of estimating meaningful distances in high dimensions.

Figure 5: MSE when predicting local model outcome of an XGBoost model averaged over 400 runs displayed with 95% confidence intervals. Neighbourhood SHAP with smaller bandwidth predicts neighbourhoods significantly better than with large bandwidths.

Interpretation of Neighbourhood SHAP.

Neighbourhood SHAP computed with small kernel widths reflects feature attributions when contrasting with model behaviour at similar observations, whereas Neighbourhood SHAP computed with large kernel widths renders model behaviour contrasting at a population scale. Figure 6

shows the evolution of Neighbourhood SHAP across bandwidths on randomly picked observations across different data sets. The test instance in the bike data set, where a XGBoost regressor predicts daily bike rentals, has a high normalised temperature of 0.82. As the median observation has a temperature of 0.50, the neighbourhood of our test instance is expected to look considerably different to the global population. For small kernel widths, Neighbourhood SHAP computes a negative attribution for temperature, whereas marginal SHAP is positive. This sign ‘flip’ is coherent with descriptive statistics: for a subpopulation with temperatures +/-0.05 around 0.82, temperature is negatively correlated with outcome (correlation equal to -0.08) whereas overall, bike rental tends to increase on warmer days (unconditional correlation equal to +0.47). Neighbourhood SHAP thus shows that a warmer temperature has in general a positive impact on the count of rental bikes, which reverses for very hot days. Standard Shapley values do not provide this type of fine-grained interpretation. Similarly, in the Boston data set (Figure

6, third column), our test instance is a dwelling with a high percentage of lower status population (LSTAT) equal to 18.76%. LSTAT gets positive Neighbourhood SHAP values for small kernel widths, whereas its marginal Shapley value is negative. This observation is consistent with the negative overall correlation, which is equal to -0.76, whereas for a restricted population with LSTAT +/- 1% it is equal to +0.15. For similar dwellings i.e. with a high pupil-teacher ratio and many rooms, lower status populations do not decrease the value of the home as much as they do in general, and can even increase it.

Interpretation of Smoothed SHAP.

In contrast, Smoothed SHAP summarises marginal Shapley values (which contrast against the entire population) within a neighbourhood, instead of at a single instance. For example, consider the adult data set (Figure 6, first column). We chose a test instance for which the model performs poorly: its predicted probability of high income for this individual, aged 42, is equal to 0.09, when in actual fact the person has a high income. It is interesting to contrast the conventional Shapley value assigned to the person, which is obtained by Smoothed SHAP with a , with the average Shapley values for individuals like them. We observe that Smoothed SHAP quickly assigns a negative attribution to age and a positive attribution to education for , whilst SHAP values were positive and negative, respectively for the individual. This highlights local instability in the Shapley values, as the SHAP numbers for people similar to the predicted person are positive for education, and negative for the age feature. For the Boston data set we note that Smoothed SHAP of the Pupil/Teacher Ratio (PTRATIO) initially decreases for a small , as there are many dwellings with a high PTRATIO in the data neighbourhood of the test instance, while it then increases as the global attribution of this feature is in general higher.

Figure 6: Scaled attributions at three different test instances (see Supplements LABEL:sec:suppl_exp) for varying kernel widths computed with 2000 reference points in the adult, bike and Boston housing data sets. Bounds for LIME have been computed over 2000 runs, while the Shapley bounds have been estimated with their theoretical formula as outlined in Supplements LABEL:sec:suppl_var.

Image Classification.

We applied our Neighbourhood SHAP approach on KernelSHAP and also on DeepSHAP (Lundberg and Lee, 2017)

which computes Shapley values for images based on gradients. After training a convolutional neural network on the MNIST data set, we explain digits with the predicted label ’8’ given a background data set of 100 images with labels ’3’ and ’8’. As we see in Figure


, Neighbourhood DeepSHAP gives pixels close to the strokes attributions with the highest absolute values while DeepSHAP assigns less sparse and more blurry attributions. This is expected: DeepSHAP compares each digit to a random digit in the population, while Neighbourhood DeepSHAP only looks at images in the neighbourhood, i.e. to images which have a similar stroke. As we show in the Supplement, the change in log odds of predicting a ’3’ after modifying the images depicting an ’8’ with the attributions (setting blue pixels to 0) is (non-significantly) higher for Neighbourhood DeepSHAP than it is for DeepSHAP. In contrast, Smoothed DeepSHAP leads to a smaller change in log-odds which is expected since we lose information by smoothing. However we see that Smoothed DeepSHAP gives additional insights compared to DeepSHAP and Global DeepSHAP: In all images the lower left corner of the 8s is highlighted in blue only for Smoothed DeepSHAP. Thus, we know that there is at least one observation in the neighbourhood of these 8s that has a strong negative attribution in that image area. This image however loses importance when aggregating over the whole data set. Note that LIME gives the sharpest results because we chose the hyperparameters such that the image is split into the largest number of super pixels. We however see that LIME gives counter-intuitive results (i.e. lower right corner of the third 8 gets the lowest attribution, lower contour of first 8 gets highest attribution).

Figure 7: Randomly picked test images with explanations of the label ’8’. Red regions are pixels that increase the predicted probability of label ’8’ while blue regions decrease the predicted probability contrasted with the background data set.

6 Discussion

In this paper, we first highlighted the limitations of using SHAP when the local model behaviour is of interest. We then introduced Neighbourhood SHAP. While neighbourhood sampling has been applied in other areas of model explainability, such as image perturbations by adding noise (Fong and Vedaldi, 2017), local linear approximations (see Supplements LABEL:sec:suppl_linear), or rule-based models (Guidotti et al., 2018; Rasouli and Yu, 2020; Rajapaksha et al., 2020), it has not been previously introduced for model agnostic additive feature models such as SHAP. Our contribution is important as it provides a theoretical understanding of explanations of local model behaviour, which is often lacking in the explainable AI literature (Garreau and von Luxburg, 2020). A secondary contribution of this work is the analyses of how smoothing Shapley values can identify unstable feature attributions. While it is difficult to evaluate model explanations numerically, we provide an exhaustive comparison of different metrics (adversarial robustness, prediction accuracy, and visual inspection). Neighbourhood SHAP and Smoothed SHAP both merit consideration, as they have considerable advantages compared to standard KernelSHAP. For comparability across experiments, we limited our analysis to the use of the euclidean distance as a distance metric. In high dimensional spaces, this choice can be misleading (Domingos, 2012) and the use of more powerful distance metrics, such as one obtained by random forests, would be appropriate. We thus caution against exclusively relying on mathematical metrics for explaining models, and suggest comparing the un-weighted and weighted histograms before any judgement calls. While it can be difficult to choose an adequate bandwidth, we see that having control over kernel width allows the user to have a precise understanding of model predictions, both locally and at a larger scale. LIME or KernelSHAP in their default implementation do not allow for such a detailed analysis. Plots of Neighbourhood SHAP and Smoothed SHAP the bandwidth are thus powerful tools that give additional insight into oblique dynamics of the black box.


SG and LTM are students of the EPSRC CDT in Modern Statistics and Statistical Machine Learning (EP/S023151/1). SG receives funding from the Oxford Radcliffe Scholarship and Novartis. LTM receives funding from the EPSRC. KDO is funded by a Wellcome Trust/Royal Society Sir Henry Dale Fellowship 218554/Z/19/Z. CH is supported by The Alan Turing Institute, Health Data Research UK, the Medical Research Council UK, the EPSRC through the Bayes4Health programme Grant EP/R018561/1, and AI for Science and Government UK Research and Innovation (UKRI). We would like to thank Luke Merrick for his kind help.