1 Introduction
AI’s potential to improve economic productivity is driven by its ability to significantly reduce the cost of predictions Agrawal et al. (2018). For these predictions to be beneficial, they should be mostly correct, operationally consumable, and cannot lead to unexpected systemic harm. The ability to explain how AI models make their predictions is a critical step towards this goal. The discipline of AI explainability is thus central to the practical impact of AI on society.
One could conservatively demand that only simple, byconstructioninterpretable models are used for predictions that meaningfully impact people’s lives Rudin (2019). Such an approach, however, sacrifices the performance upside of complex, noninterpretable models. This motivates the study of posthoc AI explainability, where the goal is to explain arbitrarily complex models.
Further distinction exists between modelspecific and modelagnostic explainability. Modelspecific methods explain a model’s predictions by referencing its internal structure; see e.g. Chen and Guestrin (2016) or Shrikumar et al. (2017). Modelagnostic methods explain predictions through inputoutput attribution, treating the model as a black box. Not only do modelagnostic methods offer general applicability, but they also provide a common language for explainability that does not require expert knowledge of the model.
Within the paradigm of posthoc, modelagnostic explainability, a number of methods are used in practice. Many measure the effect of varying features on model performance Breiman (2001); Strobl et al. (2008) or an individual prediction Baehrens et al. (2010). Another method fits an interpretable model to the original around the point of prediction to garner local understanding Ribeiro et al. (2016). However, these methods are widely adhoc and founded on prohibitively stringent assumptions, e.g. independence or linearity.
Fortunately, the general problem of attribution, of which modelagnostic explainability is an example, has been extensively developed in cooperative game theory. Shapley values
Shapley (1953) provide the unique attribution method satisfying 4 intuitive axioms: they capture all interactions between features, they sum to the model prediction, and their linearity enables aggregation without loss of theoretical control. Shapleybased AI explainability has matured over the last two decades Lipovetsky and Conklin (2001); Kononenko and others (2010); Štrumbelj and Kononenko (2014); Datta et al. (2016); Lundberg and Lee (2017).However, Shapley values suffer from a problematic assumption: they involve marginalisation over subsets of features, generally achieved by splicing data points together and thus evaluating the model on highly unrealistic data (e.g. Fig. 1). While such splicing is common in modelagnostic methods, it is only justified if all the data’s features are independent, an assumption almost never satisfied; otherwise, such spliced data lies off the data manifold. While work has been done towards remedying this flaw Aas et al. (2019); Rasouli and Yu (2019); Lundberg et al. (2020), a satisfactorily general and performant solution has yet to appear.
In this paper, we provide a detailed study of the offmanifold problem in explainability, and provide solutions to computing Shapley values on the data manifold. Our main contribution is the introduction of two new methods to compute onmanifold Shapley values for highdimensional, multitype data:

[leftmargin=15pt]

a flexible generativemodelling technique to learn onmanifold conditional distributions;

a robust supervisedlearning technique that learns the onmanifold value function directly.
In Sec. 2, we provide precise definitions of key quantities in Shapley explainability, including global Shapley values, which to our knowledge have not been introduced elsewhere. In Sec. 3, we elucidate the conceptual difference between off and onmanifold explanations and the marked drawbacks of offmanifold approaches. After presenting our solutions in Sec. 4, we demonstrate the practical effectiveness of onmanifold explainability with varied experiments in Sec. 5.
2 Shapley values on the data manifold
Here we review the Shapley framework for model explainability, define onmanifold Shapley values precisely, and introduce global explanations that obey the Shapley axioms.
2.1 Shapley values for model explainability
In cooperative game theory, a team of players work together to earn value von Neumann and Morgenstern (1944). Given a value function indicating the value that a coalition of players would earn on their own, the Shapley value provides a principled approach to distributing credit for the total earnings among the players Shapley (1953):
(1) 
The Shapley value computes player ’s marginal valueadded upon joining the team, averaged over all orderings in which the team can be constructed.
In the context of supervised learning, let
represent a model’s predicted probability that data point
belongs to class , so that . To apply Shapley attribution to model explainability, one interprets the input features as players in a game and the output as their earned value. To compute the Shapley value of each feature , one must define a value function to represent the outcome of the model on a restricted coalition of inputs .While the value function should act as a proxy for “”, the model is undefined given only partial input , so one cannot leave outofcoalition slots empty. In the standard treatment Lundberg and Lee (2017) one averages over outofcoalition features, where , drawn unconditionally from the data:
(2) 
We refer to this value function, and the corresponding Shapley values , as lying offmanifold since the splices generically lie far from the data manifold (e.g. Fig. 1). Even so, the Shapley framework guarantees that model explanations satisfy an intuitive set of properties Shapley (1953):

[leftmargin=10pt]

Efficiency. Shapley values distribute the model prediction fully among the features, up to an offset term (not attributed to any feature) representing the average probability assigns to class :
(3) 
Linearity. Shapley values aggregate linearly in a linearensemble model.

Nullity. Features that do not influence the value function receive zero Shapley value.

Symmetry. Features that influence the value function identically receive equal Shapley values.
2.2 Onmanifold Shapley values
In practice, Shapley explanations are widely based on the offmanifold value function, Eq. (2), which evaluates the model on splices, , with drawn independently of . Splicing features from unrelated data points generically leads to unrealistic model inputs. Such unrealistic splices lies outside the model’s regime of validity, where there is no reason to expect controlled model behaviour. Offmanifold explanations thus obfuscate insights into the model’s behaviour on real data.
To fix the offmanifold problem, one should condition outofcoalition features on incoalition features , thus basing Shapley explanations on an onmanifold value function:
(4) 
Note that, since the Nullity and Symmetry axioms reference the value function directly, these properties will manifest differently off and onmanifold; see e.g. Secs. 3.1 and 5.2.
Preference for an onmanifold value function is widely acknowledged Lundberg and Lee (2017); Hooker and Mentch (2019); Mase et al. (2019). However, the requisite conditional distribution
is not empirically accessible in practical scenarios with highdimensional data or features that take many (e.g. continuous) values. A performant method to estimate the onmanifold value function is untilnow lacking and the focus of this work.
2.3 Global Shapley values
As presented above, Shapley values provide a method for local explainability, explaining prediction on individual data point . For a global understanding of model behaviour, one might average over the data, with the classofinterest fixed. However, for an important feature , its local Shapley value can vary between largepositive and largenegative values, as may correlate with in some regions and anticorrelate in others. As this would lead to large cancellations, it is common to average the absolute value instead Lundberg et al. (2020). However, such a nonlinear aggregation leads to a global explanation that breaks the axioms underlying the Shapley framework.
To both preserve the Shapley axioms and avoid large cancellations, we define global Shapley values:
(5) 
where is the distribution from which the labelled data is drawn, and – crucially – class varies with data point in the average. Global Shapley values obey a sum rule that follows from Eq. (3):
(6) 
On can thus interpret the global Shapley value as the portion of the model’s accuracy attributable to the feature. Indeed, the first term in Eq. (6) is the accuracy one achieves by drawing labels from
’s predicted probability distribution over classes. The offset term is the accuracy one achieves using
none of the features: drawing the label of from the model’s output on a random input .3 Off versus onmanifold Shapley values
This section articulates key differences between model explanations off versus on the data manifold.
3.1 Functional versus informational dependence
The only effect incoalition features have in the offmanifold value function, Eq. (2), is through their role as direct model inputs, . It follows that if does not have explicit functional dependence on feature , then the offmanifold Shapley value vanishes.
By contrast, incoalition features affect the onmanifold value function, Eq. (4), through a second channel: implicitly through ’s dependence on when and correlate. The onmanifold Shapley value can thus be nonzero even for a feature that does not act upon directly. In such a case, the model does use information in , but extracts it via other features.
To demonstrate this on the Drug Consumption data from the UCI repository Dua and Graff (2017), we used the 10 binary features listed in Fig. 2 (Mushrooms, Ecstacy, etc.) to predict whether individuals had consumed an 11th drug: LSD. We modelled this data using a shallow decision tree with only 3 nodes.
Fig. 2(a) shows local explanations of the decision tree’s prediction “LSD = True” for a testset individual with features listed on the horizontal axis. The explanations are computed using Monte Carlo approximations to Eq. (1), with value functions approximated using the empirical distribution (accessible in this case, with just 10 binary features). Note that the offmanifold Shapley values are nonzero only for the 3 features that depends on explicitly, while the onmanifold explanation indicates ’s implicit dependence on information contained in all features.
Fig. 2(b) shows global Shapley values for this shallow decision tree. These global explanations are the expectation values of local explanations, as in Eq. (5). Note that all onmanifold global Shapley values are nonnegative, consistent with their interpretation as the portion of model accuracy attributable to the information contained in each feature.
3.2 The garbagein, garbageout problem
While Sec. 3.1
might lead one to believe that offmanifold explainability provides useful insight into the functional dependence of a model, it serves as a perilously uncontrolled approach, especially in complex nonlinear models such as neural networks. Indeed, it is widely known that machine learning models are not robust to distributional shift
Nguyen et al. (2015); Goodfellow et al. (2015). Still, the offmanifold value function of Eq. (2) evaluates the model outside its domain of validity, where it is untrained and potentially wildly misbehaved, in hopes that an aggregation of such evaluations will be meaningful. This garbagein, garbageout problem is the clearest reason to avoid offmanifold Shapley values.Since this point is understood in the literature Hooker and Mentch (2019); Sundararajan and Najmi (2019), we simply provide an example of this problem in Fig. 1, which shows an example binary MNIST digit LeCun and Cortes (2010), a coalition of pixels, and 5 random splices that would be used in an offmanifold explanation.
3.3 Misleading explanations off manifold
To demonstrate that offmanifold Shapley values can be misleading in practice, we generated synthetic data according to the process in Fig. 3(a). The data has two binary features and a binary label, all classbalanced. We fit a decision tree to this data: a precise match to Fig. 3(a).
Note that the features and are positively correlated, both with each other and with label . However, with fixed, the likelihood of decreases slightly from to . One might think of as disease severity, as treatment intensity, and as mortality rate.
The local Shapley values for the frequent scenario are plotted in Fig. 3(b). We find the negative offmanifold Shapley value shown for to be misleading, as it would suggest that the observation is more commonly associated with a prediction of (a occurrence), rather than the true label (). The negative value of is due to the model’s decreased confidence in when one goes from to , as well as from to . This misleading sign is therefore due to ’s heavy sensitivity to the model’s behaviour when , despite this being exceedingly rare in the data.
Fig. 3(c) displays global Shapley values for this model. Note that the onmanifold global values are positive, consistent with their interpretation as the portion of model accuracy attributable to each feature. However, there is a negative offmanifold global value that results from aggregating wrongsign local explanations. Such a negative value would indicate that input is actually detrimental to the model’s overall performance, which of course is not the case.
3.4 Onmanifold Shapley in the nonparametric limit
Here we present a result that strengthens the connection between onmanifold Shapley values and the data distribution: in the limit of a perfect model of the data, onmanifold Shapley values converge to an explanation of how the information in the data associates with the labelled outcomes.
To show why this holds, suppose the predicted probability converges to the true underlying distribution . In this nonparametric limit, the onmanifold value function of Eq. (4) becomes
(7) 
in which case onmanifold value is attributed to based on ’s predictivity of the label .
To demonstrate this empirically, we fit a random forest to the Drug Consumption data and plotted its off and onmanifold global Shapley values in Fig. 2(c). Next we fit a separate random forest to each coalition of features, models in total, as in Štrumbelj et al. (2009). We used the accuracy of each model – in the sense of Sec. 2.3 – as the value function for an additional Shapley computation:
(8) 
where is directly the average gain in accuracy that results from adding feature to the set of inputs. These values are labelled “Retrained models” in Fig. 2(c). Note their agreement with the onmanifold explanation of fixed model . Onmanifold Shapley values thus indicate which features in the data are most predictive of the label.
This consistency check allows us to show that Tree SHAP Lundberg et al. (2018, 2020) does not provide a method for onmanifold explainability. Observe in Fig. 2(c) that Tree SHAP values roughly track the offmanifold explanation: somewhat larger on the most predictive feature and somewhat smaller on the others. This occurs because trees tend to split on highpredictivity features first, and Tree SHAP privileges earlysplitting features in an otherwise offmanifold calculation.
4 Scalable approaches to computing onmanifold Shapley values
For the results of Sec. 3, the onmanifold value function, Eq. (4
), was estimated from the empirical data distribution, an approach which is not practical for complex realistic data. In this section, we develop two methods of learning the onmanifold value function: (i) unsupervised learning of the conditional distribution
, and (ii) a supervised technique to learn the value function directly.4.1 Unsupervised approach
To take an unsupervised approach to the data manifold, one can learn the conditional distribution
that appears in the onmanifold value function. We do this using variational inference and two model components. The first component is a variational autoencoder
Kingma and Welling (2014); Rezende et al. (2014), with encoder and decoder . The second is a masked encoder, , for which the goal is to map the coalition to a distribution in latent space that agrees with the encoder as well as possible. A model of is then provided by the composition:(9) 
and a good fit to the data should maximise . A lower bound to its loglikelihood is given by
(10) 
While could be used on its own as the objective function to learn , this would leave the variational distribution
unconstrained, at odds with our goal of learning a smoothmanifold structure in latent space. This concern can be mitigated by introducing
(11) 
which regularises by penalising differences from a smooth (e.g. unit normal) prior distribution . We thus include as a regularisation term in our unsupervised objective:
. This objective contains a hyperparameter
that prevents a fair comparison between models trained with different values. A separate metric to judge performance is discussed next.4.2 Metric for the learnt value function
The unsupervised method of Sec. 4.1 leads to a learnt estimate of the conditional distribution, and thus to an estimate of the onmanifold value function: . With the goal of judging this estimate, consider the following formal quantity:
(12) 
This quantity is minimal with respect to when , in agreement with the definition, Eq. (4), of the onmanifold value function. We can then quantitatively judge the performance of the unsupervised model by computing
(13) 
Note that Eq. (13) is precisely Eq. (12) averaged over coalitions drawn from the Shapley sum, features drawn from the data, and labels drawn uniformly over classes. Moreover, the meansquareerror in Eq. (13) is easy to estimate using the empirical distribution and the learnt model , thus providing an unambiguous metric to judge the outcome of the unsupervised approach.
4.3 Supervised approach
The MSE metric of Eq. (13) supports a supervised approach to learning the onmanifold value function directly. We do this by defining a surrogate model that can operate on coalitions of features (e.g. by masking outofcoalition features) and that is trained to maximise the objective:
(14) 
As discussed in Sec. 4.2, this objective is maximised as the surrogate model approaches the onmanifold value function of the modeltobeexplained.
4.4 Comparison of approaches
Our implementations of the unsupervised and supervised approaches to onmanifold explainability are summarised in App. A. Both approaches lead to broadly similar results. Fig. 4(a) compares the two techniques on the Drug Consumption data, where explanations are given for the random forest of Sec. 3.4 and compared against the computation using the empirical distribution.
The unsupervised approach to onmanifold explainability is flexible but untargeted: is datasetspecific but modelagnostic, accommodating explanations for many models trained on the same data. The supervised approach trades flexibility for increased performance: while the technique must be retrained to explain each model, it entails direct minimisation of the MSE.
The supervised method is thus expected to achieve higher accuracy. We confirmed this on all data sets studied in this paper; see Table 1 in App. A
for a numerical comparison. The supervised approach also offers increased stability, leading to a smaller variance in MSE in repeated experiments (cf. Table
1). The supervised method is more efficient as well: while the unsupervised technique estimates the value function by sampling from , the supervised approach learns the value function directly. As a result, to compute Shapley values for the experiments of Sec. 5, the supervised method required sampling roughly 10 times fewer coalitions to match the standarderror of the unsupervised method.
5 Experiments and results
Here we demonstrate the practical utility of onmanifold explainability through experiments. All numerical details, including a description of uncertainties, are given in App. B.
5.1 Abalone data
Global Shapley values represent the portion of a model’s accuracy attributable to each feature. To show that staying on manifold is required for this interpretation to be robust, we experimented on Abalone data from the UCI repository Dua and Graff (2017)
. We trained a neural network on the physical characteristics contained in the data to classify abalone as younger than or older than the median age.
Fig. 4(b) displays global Shapley values for this model. While the supervised and unsupervised techniques lead to broadly similar onmanifold explanations, observe the drastic difference that arises off manifold. This is due to the tight correlations between features in the data (4 different weights and 3 lengths) making the data manifold lowdimensional and important.
Notice further that Fig. 4(b) displays negative global Shapley values off manifold, negating their interpretation as portions of the model accuracy attributable to each feature.
5.2 Census Income data
To demonstrate that onmanifold explanations are consistent with correlations that appear in the data, we experimented on UCI Census Income data Dua and Graff (2017)
. We trained an xgboost classifier
Chen and Guestrin (2016) to predict whether an individual’s income exceeds $50k based on demographic features in the data.Fig. 4(c) displays global Shapley values for this model, using the supervised method for the onmanifold explanation. Note the large discrepancy between the offmanifold Shapley values for maritalstatus and relationship. These features are strongly correlated (married individuals most often have relationship = husband or wife) and their roughlyequal onmanifold values indicate that these features are nearly identically predictive of the model’s output.
Notice further that the Shapley value for age is significantly larger offmanifold than on. This means that the model heavily relies on age to determine its output, but that age correlates with other features, e.g. maritalstatus and education, that are also predictive of the model’s output.
5.3 Mnist
To demonstrate onmanifold explainability on higherdimensional data, we trained a simple feedforward network on binary MNIST LeCun and Cortes (2010) and explained randomly drawn digits in Fig. 5(a).
Despite having the same sum over pixels – as controlled by Eq. (3) – and explaining the same model prediction, each onmanifold explanation is more concentrated, with more interpretable structure, than its offmanifold counterpart. The handwritten strokes are clearly visible onmanifold, with key offstroke regions highlighted as well. Offmanifold explanations generally display lower intensities spread less informatively across the digitregion.
These offmanifold explanations are a direct result of splices as in Fig. 1. With such unrealistic input, the model’s output is uncontrolled and uninformative. In fact, it is only on very large coalitions of pixels, subject to minimal splicing, that the model can make intelligent predictions offmanifold. This is confirmed in Fig. 5(b), which shows the average Shapley summand as a function of coalition size on MNIST. Note that primarily large coalitions underpin offmanifold explanations, whereas far fewer pixels are required onmanifold, consistent with the lowdimensional manifold underlying the data.
6 Related work
Within the Shapley paradigm, initial work has been done to produce onmanifold explanations: Aas et al. (2019) (similar to Zintgraf et al. (2017); Gu and Tresp (2019)) explores empirical and distributionfitting techniques, while Lundberg et al. (2018) takes a treespecific approach, conditioning outofcoalition features on incoalition features appearing earlier in the tree. In contrast to these methods, we compute onmanifold Shapley values with morescalable methods of learning the data manifold, either through variational inference or supervised learning. Moreover, we show in Fig. 2(c) that Tree SHAP does not remedy the offmanifold problem.
Other onmanifold explainability methods exist as well; see e.g. Chang et al. (2019) and Agarwal et al. (2019). Complementary to our work, these methods apply to images, lie outside the Shapley paradigm, and require generative methods. We focus on general data types, operate within the Shapley framework, and offer a simpler alternative (Sec. 4.3) to generative methods.
7 Conclusion
In this work, we took a careful study of the offmanifold problem in AI explainability. We presented the distinction between on and offmanifold Shapley values in the conceptually clear setting of treebased models and lowdimensional data. We then introduced two novel techniques to compute onmanifold Shapley values for any model on any data: one technique learns to impute features on the data manifold, while the other learns the Shapley value function directly. Insodoing, we provided compelling evidence against the use of offmanifold explainability, and demonstrated that onmanifold Shapley values offer a viable approach to AI explainability in realworld contexts.
Acknowledgements
This work was developed and experiments were run on the Faculty Platform for machine learning. The authors benefited from discussions with Tom Begley, Markus Kunesch, and John Mansir. DDM was partially supported by UCL’s Centre for Doctoral Training in Data Intensive Science.
References
 [1] (2019) Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Note: [arXiv:1903.10464] Cited by: §1, §6.
 [2] (2019) Removing input features via a generative model to explain their attributions to classifier’s decisions. Note: [arXiv:1910.04256] Cited by: §6.

[3]
(2018)
Prediction machines: the simple economics of artificial intelligence
. Harvard Business Press. Cited by: §1.  [4] (2010) How to explain individual classification decisions. Journal of Machine Learning Research. Cited by: §1.
 [5] (2001) Random forests. Machine learning. Cited by: §1.
 [6] (2019) Explaining image classifiers by counterfactual generation. In International Conference on Learning Representations, Cited by: §6.
 [7] (2016) Xgboost: a scalable tree boosting system. In International Conference on Knowledge Discovery and Data Mining, Cited by: §B.3, §1, §5.2.
 [8] (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In IEEE Symposium on Security and Privacy, Cited by: §1.
 [9] (2017) UCI machine learning repository. Note: [archive.ics.uci.edu/ml] Cited by: §B.1, §B.2, §B.3, §3.1, §5.1, §5.2.
 [10] (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §3.2.
 [11] (2019) Contextual prediction difference analysis. Note: [arXiv:1910.09086] Cited by: §6.
 [12] (2019) Please stop permuting features: an explanation and alternatives. Note: [arXiv:1905.03151] Cited by: §2.2, §3.2.
 [13] (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations, Cited by: §B.1.
 [14] (2014) Autoencoding variational bayes. In International Conference on Learning Representations, Cited by: §4.1.
 [15] (2010) An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research. Cited by: §1.
 [16] (2010) MNIST database. Note: [yann.lecun.com/exdb/mnist] Cited by: §B.4, §3.2, §5.3.
 [17] (2001) Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry. Cited by: §1.
 [18] (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence. Cited by: §1, §2.3, §3.4.
 [19] (2018) Consistent individualized feature attribution for tree ensembles. Note: [arXiv:1802.03888] Cited by: §3.4, §6.
 [20] (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, Cited by: §B.1, §1, §2.1, §2.2.
 [21] (2019) Explaining black box decisions by Shapley cohort refinement. Note: [arXiv:1911.00467] Cited by: §2.2.

[22]
(2015)
Deep neural networks are easily fooled: high confidence predictions for unrecognizable images.
In
IEEE Conference on Computer Vision and Pattern Recognition
, Cited by: §3.2.  [23] (2019) Meaningful data sampling for a faithful local explanation method. In Intelligent Data Engineering and Automated Learning, Cited by: §1.

[24]
(2014)
Stochastic backpropagation and approximate inference in deep generative models
. In International Conference on Machine Learning, Cited by: §4.1.  [25] (2016) Why should I trust you: explaining the predictions of any classifier. In International Conference on Knowledge Discovery and Data Mining, Cited by: §1.
 [26] (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence. Cited by: §1.
 [27] (1953) A value for person games. In Contribution to the theory of games, Cited by: §1, §2.1, §2.1.
 [28] (2017) Learning important features through propagating activation differences. In International Conference on Machine Learning, Cited by: §1.
 [29] (2008) Conditional variable importance for random forests. BMC Bioinformatics. Cited by: §1.
 [30] (2009) Explaining instance classifications with interactions of subsets of feature values. Data Knowl. Eng.. Cited by: §3.4.
 [31] (2014) Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems. Cited by: §1.
 [32] (2019) The many Shapley values for model explanation. Note: [arXiv:1908.08474] Cited by: §3.2.
 [33] (1944) Theory of games and economic behavior. Princeton University Press. Cited by: §2.1.
 [34] (2017) Visualizing deep neural network decisions: prediction difference analysis. In International Conference on Learning Representations, Cited by: §6.
Appendix A Implementation details
Data set  Supervised  Unsupervised 

Drug  
Abalone  
Census  
Mnist 
For the unsupervised approach, we modelled the encoder
as a diagonal normal distribution with mean and variance determined by a neural network:
(15) 
We modelled the decoder as a product distribution:
(16) 
where the distribution type (e.g. normal, categorical) of each is chosen perdataset and each distribution’s parameters are determined by a shared neural network. We modelled the masked encoder as a Gaussian mixture:
(17) 
To allow to accept variablesize coalitions as input, we simply masked outofcoalition features with a special value () that never appears in the data.
The unsupervised method has several hyperparameters: which multiplies the regularisation term in Eq. (11), the number of components in Eq. (17), as well the architecture and optimisation of the networks involved. For each experiment in this paper, we tuned hyperparameters to minimise the MSE of Eq. (13) on a heldout validation set; see App. B for numerical details.
For the supervised approach, we modelled using a neural network, again masking outofcoalition features with to accommodate variablesize coalitions . This method’s hyperparameters, relating to architecture and optimisation, were similarly tuned to minimise validationset MSE; see App. B for details.
As discussed in Sec. 4.4, the supervised method is expected to achieve a smaller MSE than the supervised approach. We confirmed this on all data sets studied in this paper; see Table 1
for a numerical comparison. In the table, central values indicate the testset MSE achieved by each method. The uncertainties represent the standard deviation in testset MSE upon retraining each method with fixed hyperparameters 10 times. This indicates that the supervised method also offers increased stability over the unsupervised approach.
Appendix B Details of experiments
Here we provide numerical details for all experiments presented in the paper.
b.1 Drug Consumption experiment
Several experiments were performed on the Drug Consumption data from the UCI repository [9]. We used 10 binary features from the data set – Mushrooms, Ecstasy, etc., as displayed in Fig. 2 – to predict whether individuals had ever consumed an 11th drug: LSD.
The Shapley values in Fig. 2(a) and Fig. 2(b) describe a single decision tree fit with default sklearn parameters as well as max_depth = 1 and max_features = None. While the data exhibits a class balance, the decision tree achieves 82.0% accuracy on a heldout test set.
Local offmanifold Shapley values in Fig. 2(a) were computed by Monte Carlo sampling permutations to estimate Eq. (1). For each sampled permutation, a random data point was drawn from the test set to estimate the offmanifold value function of Eq. (2). Bar heights in Fig. 2(a) are the means that resulted from Monte Carlo samples per feature. Throughout the paper, error bars represent standard errors of the means.
Local onmanifold Shapley values in Fig. 2(a) were again computed using Monte Carlo samples of Eq. (1), but this time using the onmanifold value function of Eq. (4). For each sampled coalition , a random data point was drawn from the test set, with the crucial requirement that . In the text, we refer to this as empirically estimating the conditional distribution . Such empirical estimation is only possible because this data set has a small number of allbinary features.
Global Shapley values in Fig. 2(b) were similarly computed using Monte Carlo samples of Eq. (5). For each labelled data point sampled from the test set, a single permutation was drawn to estimate Eq. (1) and a single data point was drawn to estimate the value function.
The Shapley values of Fig. 2(c) describe a random forest fit with default sklearn parameters and max_features = None, which achieves 82.2% testset accuracy. Global off and onmanifold Shapley values were computed just as in Fig. 2(b). Tree SHAP values were computed with the SHAP package [20] with model_output = margin and feature_perturbation = tree_path_dependent.
The values labelled “Model retraining” in Fig. 2(c) were computed by fitting a separate random forest for each coalition of features in the data set: models in all. We used these models to compute the sum of Eq. (8), where represents a variant of model ’s accuracy: it is the accuracy achieved if one predicts labels by drawing stochastically from ’s predicted probability distribution (as opposed to deterministically drawing the maximumprobability class).
The global onmanifold Shapley values in Fig. 2(c) appear in Fig. 4(a) as well, labelled “Empirical”. Fig. 4(a) also displays onmanifold Shapley values computed using the supervised and unsupervised methods introduced in this paper. As above, these are Monte Carlo estimates of Eq. (5). The supervised method involved training a fullyconnected network on the MSE loss of Eq. (14). All neural networks in this paper used 2 flat hidden layers, Adam [13] for optimisation, and a batch size of 256. We scanned over a grid with
hidden layer size  (18)  
learning rate 
choosing the point with minimal MSE on a heldout validation set after 10k epochs of training; see Table
2. Each supervised value in Fig. 4(a) corresponds to Monte Carlo samples.The unsupervised method involved training a variational autoencoder to minimise the loss of Sec. 4.1, as described in App. A. The encoder, decoder, and masked encoder were each modelled using fullyconnected networks, trained using early stopping, with patience 100. We scanned over a grid of hidden layer sizes and learning rates as in Eq. (18) as well as
latent dimension  (19)  
latent modes  
regularisation 
choosing the point with minimal validationset MSE; see Table 2. Unsupervised values in Fig. 4(a) correspond to Monte Carlo samples.
b.2 Abalone experiment
For the experiment of Sec. 5.1, we used the Abalone data set from the UCI repository [9]. The data contains 8 features corresponding to physical measurements (see Fig. 4b) which we used to classify abalone as younger than or older than the median age. We trained a neural network to perform this task – with hidden layer size 100, default sklearn parameters, and early stopping – obtaining a testset accuracy of 78%.
Shapley values in Fig. 4(b) were computed exactly as described in Sec. B.1, except that the supervised method involved training for 5k epochs. Optimised hyperparameters are given in Table 2.
Data set  Method  Hidden dim.  Learn. rate  Latent dim.  Modes  

Drug  supervised  512  
unsupervised  128  4  1  0.5  
Abalone  supervised  512  
unsupervised  256  2  1  0.05  
Census  supervised  512  
unsupervised  128  8  1  1  
Mnist  supervised  512  
unsupervised  512  16  1  1 
b.3 Census Income experiment
For the experiment of Sec. 5.2, we used the Census Income data set from the UCI repository [9]. The data contains 49k individuals from the 1994 US Census, as well as 13 features (see Fig. 4c) which we used to predict whether annual income exceeded $50k. We trained an xgboost classifier [7] with default parameters, achieving a testset accuracy of 85% amidst a class balance.
Shapley values in Fig. 4(c) were computed exactly as described in Sec. B.1, except that the supervised method used 5k epochs, and the unsupervised method used patience 50. Optimised hyperparameters are given in Table 2. The onmanifold values in Fig. 4(c) were computed using the supervised method. While the unsupervised method does not appear in the figure, it was performed to complete Table 1.
b.4 MNIST experiment
In Sec. 5.3, we used binary MNIST [16]. We trained a fullyconnected network – with hidden layer size 512, default parameters, and early stopping – achieving 98% testset accuracy.
The digits in Fig. 5(a) were randomly drawn from the test set. Shapley values in Fig. 5(a) were computed exactly as described in Sec. B.1, except that the supervised method involved training for 2k epochs, and the onmanifold explanations are based on 16k Monte Carlo samples per pixel. Optimised hyperparameters are given in Table 2. The onmanifold explanations in Fig. 5(a) were computed using the supervised method. While the unsupervised method does not appear in the figure, it was performed to complete Table 1.
The average uncertainty, which is not shown in Fig. 5(a), is roughly 0.002 – stated as a fraction of the maximum Shapley value in each image.
Comments
There are no comments yet.