True to the Model or True to the Data?

by   Hugh Chen, et al.

A variety of recent papers discuss the application of Shapley values, a concept for explaining coalitional games, for feature attribution in machine learning. However, the correct way to connect a machine learning model to a coalitional game has been a source of controversy. The two main approaches that have been proposed differ in the way that they condition on known features, using either (1) an interventional or (2) an observational conditional expectation. While previous work has argued that one of the two approaches is preferable in general, we argue that the choice is application dependent. Furthermore, we argue that the choice comes down to whether it is desirable to be true to the model or true to the data. We use linear models to investigate this choice. After deriving an efficient method for calculating observational conditional expectation Shapley values for linear models, we investigate how correlation in simulated data impacts the convergence of observational conditional expectation Shapley values. Finally, we present two real data examples that we consider to be representative of possible use cases for feature attribution – (1) credit risk modeling and (2) biological discovery. We show how a different choice of value function performs better in each scenario, and how possible attributions are impacted by modeling choices.



There are no comments yet.


page 1

page 2

page 3

page 4


RKHS-SHAP: Shapley Values for Kernel Methods

Feature attribution for kernel methods is often heuristic and not indivi...

Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles

Additive feature explanations using Shapley values have become popular f...

Planning with Expectation Models

Distribution and sample models are two popular model choices in model-ba...

Mutual information-based group explainers with coalition structure for machine learning model explanations

In this article, we propose and investigate ML group explainers in a gen...

CAWA: An Attention-Network for Credit Attribution

Credit attribution is the task of associating individual parts in a docu...

Feature relevance quantification in explainable AI: A causality problem

We discuss promising recent contributions on quantifying feature relevan...

Competition analysis on the over-the-counter credit default swap market

We study two questions related to competition on the OTC CDS market usin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Shapley values

One of the most popular approaches to machine learning interpretability in recent years has involved using Shapley values to attribute importance to features (Štrumbelj and Kononenko, 2014; Lundberg and Lee, 2017)

. The Shapley value is a concept from coalitional game theory that fairly allocates the surplus generated by the grand coalition in a game to each of its players

(Shapley, 1953). In this general sense, the Shapley value allocated to a player is defined as:


where is one possible permutation of the order in which the players join the coalition, is the set of players joining the coalition before player , and is a coalitional game that maps from the power set of all players to a scalar value.

1.1 Choice of value function

While the Shapley value is provably the unique solution which satisfies a variety of axioms for an abstract n-person game, figuring out how to represent a machine learning model () as a coalitional game () is non-trivial. Previous work has suggested a variety of different functional forms for , for tasks like data valuation and global feature importance (Ghorbani and Zou, 2019; Covert et al., 2020). In this work, we focus on local feature attribution – trying to understand how much each feature contributed to the output of a model for a particular sample. For this application, the reward of the game is typically the conditional expectation of the model’s output, where the players in the game are the known features in the conditional expectation. There are two ways the model’s output () for a particular sample is used to define :

1. Observational conditional expectation: This is the formulation in (Lundberg and Lee, 2017; Aas et al., 2019; Frye et al., 2019). The coalitional game is


where conditioning on means considering the input

to be a random variable where the features in

are known ().

2. Interventional conditional expectation: This approach is endorsed by (Janzing et al., 2019; Sundararajan and Najmi, 2019; Datta et al., 2016), and in practice is used to approximate the observational conditional expectation in (Lundberg and Lee, 2017). Here the coalitional game is defined as:


where we “intervene” on the features by breaking the dependence between features in and the remaining features. We refer to Shapley values obtained with either approach as either observational or interventional Shapley values.

1.2 Apparent problems with choice of value function

Previous work has pointed out issues with each choice of value function. For example, Janzing et al. (2019) and Sundararajan and Najmi (2019) both point out that the observational Shapley value can attribute importance to irrelevant features – features which were not used by the model. While this does not violate the original Shapley axioms, it does violate a new axiom called Dummy proposed by Sundararajan and Najmi (2019), which requires that a feature will get attribution , if for any two values and and for every value , . On the other hand, papers like Frye et al. (2019) have noted that using the interventional Shapley value (which breaks the dependence between features) will lead to evaluating the model on “impossible data points” that lie off the true data manifold.

While recent work has gone so far as to suggest that having two separate approaches presents an irreconcilable problem with using Shapley values for feature attribution (Kumar et al., 2020), in this paper, we argue that rather than representing some critical flaw in using the Shapley value for feature attribution, each approach is meaningful when applied in the proper context. Further, we argue that this choice depends on whether you want attributions that reflect the behavior of a particular model, or attributions that reflect the correlations in the data.

2 Linear SHAP

In order to understand both approaches, we will focus on linear models where we present a novel algorithm to compute the observational Shapley values. Moving forward, where

is a row vector and

a scalar.

2.1 Interventional conditional expectation

For an interventional conditional expectation, the Shapley values (which we denote as ) are:


This was shown for independent features (Aas et al., 2019) and the interventional conditional expectation gives the same explanations.

2.2 Observational conditional expectation

Computing the Shapley values for an observational conditional expectation is substantially harder, with a number of proposed algorithms for doing so. Sundararajan and Najmi (2019)

utilizes the empirical distribution, which often assigns zero probability to plausible samples even for large samples.

Mase et al. (2019) extends this empirical distribution by including a similarity metric. In Aas et al. (2019), the unknown features are sampled from either a multivariate gaussian conditional, a gaussian copula conditional, or an empirical conditional distribution. In Frye et al. (2019)

, the conditional is modeled using an autoencoder. For a linear model, the problem reduces to estimating the conditional expectation of

given different subsets222The key observation is the for a linear , the expectation (and the conditional expectation) has the following property :


Estimating this conditional expectation is hard in general, so we assume the inputs are multivariate normal. Then, denote the projection matrix that selects a set as (therefore, returns the features from in ), then is333

In words, the conditional expectation for a normal distributed random variable is known to be

; however, this gives us a vector in . Since expects an input in , we project the conditional expectation into by multiplying by , where . The resulting vector has all of the features not in set to zero. These features can simply be set to their known values since we are conditioning on them, hence the addition of .:


At this point, we have a natural solution to obtain the conditional expectation Shapley value for a single sample. If we compute (7) for all sets , we can use the combinations definition of Shapley values () to compute the Shapley value exactly.

Computational complexity: Each term in the summation requires a matrix multiplication/inversion which is complexity in the size of the matrix. Since we do this for all possible subsets, the computational complexity to obtain is . To obtain , re-running this algorithm would result in a complexity of . Alternatively, if we re-use terms in the summation we get a complexity of .

Finally, to obtain for samples, we have to incur this exponential cost times for each explanation. Instead, we can isolate the exponential computation to a matrix that does not depend on itself. This implies that if we can incur an exponential cost once, we can explain all samples in low order polynomial time. To do so, we can factor (7) to get:


where and . Then, if we use equation (8) to revisit (5), we get:


where and . Here, we can see that computing and is exponential 444In fact, the complexity to compute them for all features is , however, once we have computed and , we can compute the Shapley value quickly555In a few matrix multiplications and an addition..

Note that just as the original Shapley values have been approximated using Monte Carlo sampling, we can likewise approximate and by sampling from the permutations (or combinations) of feature orderings. In contrast to traditional sampling approaches which approximate the summation in Equation 5, approximating and converges much faster because we do not need to separately converge for each input feature.

3 Effects of Correlation

3.1 Impact of correlation on convergence

Figure 1: Convergence of correlated linear Shapley value estimates. ’s attributions are blue, is orange, and

is green. We report the standard deviations from 20 estimates of the observational Shapley values using a fixed number of samples of combinations


In order to build intuition about the observational Shapley values for linear models, we first examine a simulated example with a known distribution. In the following example, the features are , the model is , and the sample being explained is .

In Figure 1

, there are three cases: (1) Independent implies that the correlation is the identity matrix

, (2) Pair correlation () implies that , except , and (3) Off-diagonal correlation () implies that if and otherwise.

We can observe that when features are independent, the observational Shapley value estimates are

which coincides with the interventional Shapley value estimates. Furthermore, for data that is truly independent, there is no variance in the estimates and they converge immediately. For other correlation patterns, we can observe two trends: 1.) correlation splits the

as credit between correlated variables and 2.) higher levels of correlation leads to slower convergence of the observational Shapley value estimates.

3.2 Explaining a feature not used by the model

Figure 2: Interventional vs. observational SHAP values for NHANES. In a.), we use the interventional approach whereas in b.), we use the observational approach with different sets of features. In these summary plots, each point is an individual where the x-axis is the Shapley value, or the impact on the model’s output). The color is the value of the feature listed in the y-axis.

In order to compare interventional/observational Shapley values on real data we utilize data from the National Health and Nutrition Examination Survey (NHANES). In particular, we focus on the task of predicting 5-year mortality within individuals (n=25,535) from 1999-2014, where mortality status is collected in 2015666We filter out individuals with unknown mortality status.. Note that observational conditional Shapley values require the covariance and mean of the underlying distribution, for which we use the sampling covariance and mean.

For Figure 2, we use five features: Age, Income ratio, Systolic blood pressure, Waist circumference, and Body mass index (BMI). In particular, we train a linear model on the first four features (excluding BMI (test AUC: 0.772, test AP: 0.186). Then, the interventional Shapley values give credit depending on the corresponding coefficient. In particular, Age positively impacts mortality prediction whereas Income ratio, Waist circumference, and Systolic blood pressure all negatively impact mortality prediction. Finally, BMI has no importance because it is not used in the model.

However, for the observational Shapley values, we first observe that the number of features used to explain the model impact the attributions (four features in 2b and five features in 2c). In Figure 2b, we see that the relationships are similar, but slightly different to the interventional Shapley values in 2a due to correlation in the data. In Figure 2c, we can see that when we include BMI, the importance of the other features is relatively lower. This implies that even though BMI is not included in the model, the correlation between BMI and other features makes BMI important under observational Shapley values. Being able to explain features not in the model has implications for detecting bias. For instance, a linear model may use correlations between features to implicitly depend on a sensitive feature that was explicitly excluded. Observational Shapley values provide a tool to identify such bias (though we note that in this case there are other approaches to identify surrogate variables such as correlation analysis).

4 True to the Model or True to the Data

We now consider two examples using real world datasets and use cases that demonstrate why neither the observational nor the interventional conditional expectation are the right choice in general, but can be chosen based on the desired application. We use these applications to argue that the choice of conditional expectation comes down to whether you want your attributions to be true to the model or true to the data.

4.1 True to the Model

Figure 3:

Modifying features according to the Interventional SHAP values helps applicants decrease their predicted log odds of default much more than Observational SHAP values. Solid line indicates mean change in log odds, while shaded region indicates standard deviation over all applicants. The wide range is expected as applicants who are very close to the mean or with very low odds of default to begin with will not be able to further decrease their odds of default by setting features to the mean.

We first consider the case of a bank that uses an algorithm to determine whether or not to grant loans to applicants. Applicants who have been denied loans may want an explanation for why they were denied, and to understand what they would have to change to receive a loan (Bhatt et al., 2020). In this case, the mechanism we want to explain is the particular model the bank uses. In that case, we argue that we want our feature attributions to be true to the model. Therefore, we hypothesize that the interventional conditional expectation is preferable, as it is the choice of value function that satisfies the Dummy axiom – only features that are referenced by the model will be given importance.

To investigate this, we downloaded the LendingClub dataset777

, a large dataset of loans issued from a peer-to-peer lending site which includes loan status and latest payment information, as well as a variety of features describing the applicants such as number of open bank accounts, age, and amount requested. We trained a logistic regression model to predict whether or not an applicant would default on their loan. We obtained feature attributions using either the observational or interventional conditional expectation.

To see which set of explanations was more useful to hypothetical applicants, we wanted to see which set of explanations helped applicants most decrease their risk of default according to the model (and consequently most increase their likelihood of being granted a loan). We therefore ranked all of the features for each applicant by their Shapley value, and allowed each applicant to “modify their risk” by setting that feature to the mean. We then measured the change in the model’s predicted log odds of default after each feature (up to 10 features) had been mean-imputed. For this metric, the better the explanation, the faster the predicted log odds of default will decrease.

We find that using the interventional conditional expectation leads to significantly better results than the observational conditional expectation (Figure 3). Intervening on the features ranked by the interventional Shapley values lead to a far greater decrease in predicted likelihood of default. In other words, the interventional Shapley values enabled interventions on individuals’ features that drastically changed their predicted likelihood of receiving a loan.

When we consider the axioms fulfilled by each choice of value function, this result makes sense. As pointed out in Janzing et al. (2019) and Sundararajan and Najmi (2019), and as shown in subsection 3.2, the observational Shapley value spreads importance among correlated features that may not be explicitly used by the machine learning model. Intervening on such features will not impact the model’s output. In contrast, the interventional Shapley value is true to the model in the sense that it gives importance to features explicitly used by the model. For a linear model, this means the interventional approach will first change a feature where is largest. Compared to other features, mean imputing will provide the greatest change to the predicted output of a linear model. Being true to the model is the best choice for most applications of explainable AI, where the goal is to explain the model itself.

Figure 4:

Left: When explaining a sparse model (Lasso regression), more true features are recovered when using the observational Shapley value to spread credit among correlated features than using the interventional Shapley value. Right: When using the interventional Shapley value, we recover more true features when the underlying model spreads credit among groups of correlated features (Elastic Net) than when the underlying model is sparse (Lasso).

4.2 True to the Data

We now consider the complementary case where we care less about the particular machine learning model we have trained, and more about a natural mechanism in the world. We use a dataset of RNA-seq gene expression measurements in patients with acute myeloid leukemia, a blood cancer (Tyner et al., 2018). An important problem in cancer biology is to determine which genes’ expression determine a particular outcome (e.g., response to anti-cancer drugs). One common approach is to measure gene expression in a set of patient samples, measure response to drugs in vitro, then use machine learning to model the data and examine the weights of the model (Zou and Hastie, 2005; Lee et al., 2018; Janizek et al., 2018).

To create an experimental setting where we have access to the ground truth, we take the real RNA-seq data and simulate a drug response label as a function of 40 randomly selected causal genes (out of 1000 total genes). The label is defined to be the sum of the causal genes plus gaussian noise. After training a Lasso model, we explain the model for many samples using the observational and interventional Shapley values and rank the genes by their average magnitude Shapley value to get two sets of global feature importance values (Lundberg et al., 2020). We see that ranking the features according to the observational Shapley values recovers more of the true causal features at each position in the ranked list than the interventional Shapley values (Figure 4, left). The green line in the figure represents the expected number of true genes that would be cumulatively found at each position in the ranked list if the gene list were sorted randomly. While we see that both Shapley value-based rankings outperform random rankings, the observational approach outperforms the interventional one.

This example helps to illustrate why the Dummy axiom is not necessarily a useful axiom in general. In the case of biological discovery, we do not care about the particular linear model we have trained on the data. We instead care about the true data generating process, which may be equally well-represented by a wide variety of models (Breiman and others, 2001). Therefore, when ranking genes for further testing, we want to spread credit among correlated features that are all informative about the outcome of interest, rather than assigning no credit to features that are not explicitly used by a single model.

While the observational Shapley value may be preferable to the interventional Shapley value when explaining a Lasso model, Elastic Net (i.e., a penalty on L1 and L2 norm of the coefficients) is actually more popular for this application (Zou and Hastie, 2005). While a Lasso model may achieve high predictive performance, it will attempt to sparsely pick out features from among groups of correlated features. We re-run the same experiment, but rather than comparing the observational and interventional Shapley values applied to a Lasso model, we focus on the interventional Shapley values for (1) a Lasso model (as in the previous experiment), or (2) an Elastic Net model (Figure 4, right). We find that by using the Elastic Net regularization penalty, the model itself is able to spread credit among correlated features, better respecting the correlation in the data. It is worthwhile to point out here that Elastic Net models became popular for this task because it is typical practice to interpret linear models by examining the coefficient vector, which is itself an interventional style explanation (partial derivative). Since this interventional explanation does not spread credit among correlated features, it is necessary to spread the credit using modeling decisions.

We have seen that when the goal is to be true to the data

, there are two methods for spreading credit to correlated features. One is to spread credit using the observational Shapley value as a feature attribution. The other is to train a model that itself spreads credit among correlated features, in this case by training an Elastic Net regression. When we factor in the computation time for these two approaches, the choice becomes clear. Estimating the transform matrices for the observational conditional with 1000 samples took 6 hours using the CPUs on a 2018 MacBook Pro, while hyperparameter tuning and fitting an Elastic Net regression took a matter of seconds.

5 Conclusion

In this paper, we analyzed two approaches to explain models using the Shapley value solution concept for cooperative games. In order to compare these approaches we focus on explaining linear models and present a novel methodology for explaining linear models with correlated features. We analyze two different settings where either the interventional Shapley values or the observational Shapley values are preferable. In the first setting, we consider a model trained on loans data that might be used to determine which applicants obtain loans. Because applicants in this setting are ultimately interested in why the model makes a prediction, we call this case ”true to the model” and show that interventional Shapley values serve to modify the model’s prediction more effectively. In the second setting we consider a model trained on biological data that aims to understand an underlying causal relationship. Because this setting is focused on scientific discovery, we call this case ”true to the data” and show that for a sparse model (Lasso regularized) observational Shapley values discover more of the true features. We also find that modeling decisions can achieve some of the same effects, by demonstrating that the interventional Shapley values recover more of the true features when applied to a model that itself spreads credit among correlated features than when applied to a sparse model.

Limitations and future directions: In the RNA-seq experiment we identified two solutions to identify true features: (1) Lasso regression with observational Shapley values where correlation is spread through the attribution method and (2) Elastic Net regression with interventional Shapley values where correlation is spread through the model estimation. While both approaches achieved similar efficacy, we found that the latter was far more computationally tractable. As future work, we aim to further analyze which of these approaches are preferable or even feasible for scientific discovery beyond linear models.

Currently, the best case for feature attribution is when the features that being perturbed are independent to start with. In that case, both the observational and interventional approaches yield the same attributions. Therefore, future work that focuses on reparameterizing the model to get at the underlying independent factors is a promising approach to eliminate the true to the model vs. true to the data tradeoff, where we can perturb the data interventionally without generating unrealistic input values.


  • K. Aas, M. Jullum, and A. Løland (2019) Explaining individual predictions when features are dependent: more accurate approximations to shapley values. arXiv preprint arXiv:1903.10464. Cited by: §1.1, §2.1, §2.2.
  • U. Bhatt, A. Xiang, S. Sharma, A. Weller, A. Taly, Y. Jia, J. Ghosh, R. Puri, J. M. Moura, and P. Eckersley (2020) Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 648–657. Cited by: §4.1.
  • L. Breiman et al. (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical science 16 (3), pp. 199–231. Cited by: §4.2.
  • I. Covert, S. Lundberg, and S. Lee (2020) Understanding global feature contributions through additive importance measures. arXiv preprint arXiv:2004.00668. Cited by: §1.1.
  • A. Datta, S. Sen, and Y. Zick (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pp. 598–617. Cited by: §1.1.
  • C. Frye, I. Feige, and C. Rowat (2019) Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability. arXiv preprint arXiv:1910.06358. Cited by: §1.1, §1.2, §2.2.
  • A. Ghorbani and J. Zou (2019) Data shapley: equitable valuation of data for machine learning. arXiv preprint arXiv:1904.02868. Cited by: §1.1.
  • J. D. Janizek, S. Celik, and S. Lee (2018) Explainable machine learning prediction of synergistic drug combinations for precision cancer medicine. bioRxiv, pp. 331769. Cited by: §4.2.
  • D. Janzing, L. Minorics, and P. Blöbaum (2019) Feature relevance quantification in explainable ai: a causality problem. arXiv preprint arXiv:1910.13413. Cited by: §1.1, §1.2, §4.1.
  • I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler (2020) Problems with shapley-value-based explanations as feature importance measures. arXiv preprint arXiv:2002.11097. Cited by: §1.2.
  • S. Lee, S. Celik, B. A. Logsdon, S. M. Lundberg, T. J. Martins, V. G. Oehler, E. H. Estey, C. P. Miller, S. Chien, J. Dai, et al. (2018) A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nature communications 9 (1), pp. 1–13. Cited by: §4.2.
  • S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. Lee (2020) From local explanations to global understanding with explainable ai for trees. Nature machine intelligence 2 (1), pp. 2522–5839. Cited by: §4.2.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in neural information processing systems, pp. 4765–4774. Cited by: §1.1, §1.1, §1.
  • M. Mase, A. B. Owen, and B. Seiler (2019) Explaining black box decisions by shapley cohort refinement. arXiv preprint arXiv:1911.00467. Cited by: §2.2.
  • L. S. Shapley (1953) A value for n-person games. Contributions to the Theory of Games 2 (28), pp. 307–317. Cited by: §1.
  • E. Štrumbelj and I. Kononenko (2014) Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems 41 (3), pp. 647–665. Cited by: §1.
  • M. Sundararajan and A. Najmi (2019) The many shapley values for model explanation. arXiv preprint arXiv:1908.08474. Cited by: §1.1, §1.2, §2.2, §4.1.
  • J. W. Tyner, C. E. Tognon, D. Bottomly, B. Wilmot, S. E. Kurtz, S. L. Savage, N. Long, A. R. Schultz, E. Traer, M. Abel, et al. (2018) Functional genomic landscape of acute myeloid leukaemia. Nature 562 (7728), pp. 526. Cited by: §4.2.
  • H. Zou and T. Hastie (2005) Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology) 67 (2), pp. 301–320. Cited by: §4.2, §4.2.