Machine learning models are increasingly being used to replace human decision-making for tasks involving some kind of prediction. As state-of-the-art predictive machine learning models become increasingly inscrutable, there has been an increase in concern that the black-box nature of these systems can obscure undesirable properties of the decision algorithm, such as illegal bias or signals accidentally learned from artifacts irrelevant to the task at hand. More recently, attempts have been made to “explain” the output of a complicated function in terms of its inputs to address these and other concerns. One of the more prominent tools in this literature has been the Shapley value, a method for additively attributing value among players of a cooperative game. In this setting, the “players” are the features used by the model, and the game is the prediction of the model. A variety of methods to assign feature influence using the Shapley value have recently been developed (lipovetsky2001analysis; vstrumbelj2014explaining; lundberg2018consistent; datta2016indirect; merrick2019explanation; frye2019asymmetric; aas_explaining_2019).
In this paper, we demonstrate that Shapley-value-based explanations for feature importance fail to serve their desired purpose. We make this argument in two parts. Firstly, we show that Shapley-value based explanations either fail to satisfy natural mathematical properties we would expect from an explanation, or require the introduction of extensive domain modeling that is typically not part of an explanation pipeline. Secondly, taking a human-centric perspective, we evaluate Shapley-value-based explanations through established frameworks of what people expect from explanations, and find them wanting.
In this section, we define the Shapley value and articulate the different ways in which it has been applied to the problem of feature importance.
2.1 Classical Shapley values
In cooperative game theory, a coalitional game consists of a set of
players and a characteristic functionwhich maps subsets to a real value , satisfying . The value function represents how much collective payoff a set of players can gain by “cooperating” as a set. The Shapley value is one way to allocate the total value of the grand coalition, , between the individual players. It is based on trying to answer the question: how much does player contribute to the coalition?
The marginal contribution of player with respect to a coalition is defined as the additional value generated by including in the coalition:
Intuitively, the Shapley value can be understood as a weighted average of a player’s marginal contributions to every possible subset of players. Let be the set of permutations of the integers up to , and given let represent the players preceding player in . The Shapley value of feature is then
This can be rewritten in terms of the unique subsets and the number of permutations for which precedes feature in a permutation:
This value is the unique allocation of the grand coalition which satisfies the following axioms:
Symmetry: For two players , if for any subset of features , then .
Dummy: For a single player , if for all subsets , then .
Additivity: For a single feature and two games and , .
2.2 Shapley values for feature importance
Several methods have been proposed to apply the Shapley value to the problem of feature importance. Given a model , the features from 1 to can be considered players in a game in which the payoff is some measure of the importance or influence of that subset. The Shapley value can then be viewed as the “influence” of on the outcome.
In this section, we describe methods which consist of defining a value function and computing (or approximating) the resulting Shapley values. We will use the following notation:
: the set of features
: a multivariate random variable
: a set of values
: the set of random variables
: the set of values
2.2.1 Value functions
Shapley values have a fairly long history in the context of feature importance. kruskal1987relative and lipovetsky2001analysis
proposed using the Shapley value to analyze global feature importance in linear regression by using the value functionto represent the of a linear model built on predictors
, to decompose the variance explained additively between the features.owen2017shapley applied the Shapley value to the problem of sensitivity analysis, where the total variance of a function is the quantity of interest.
Many recently proposed “local” methods ribeiro2016should; lundberg2017unified; lundberg2018consistent define a value function that depends on a specific data instance to explain how each feature contributes to the output of the function on this instance. The value of the grand coalition, in this setting, is the prediction of the model at : . In addition, to use Shapley values as an “explanation” of the (grand coalition of) features in this way, these methods also need to specify how acts on proper subsets of the features.
The definitions of Shapley sampling values vstrumbelj2014explaining, as well as SHAP values lundberg2017unified, are derived from defining as the conditional expected model output on a data point when only the features in are known:
Quantitative Input Influence (QII) datta2016algorithmic draws on ideas from causal inference to propose simulating an intervention on the features not in , thus breaking correlations with the features in and taking an expectation over the joint marginal distribution of :
This approach (using intervention distributions) was further generalized by merrick2019explanation
so as to unify a number of different approaches to estimating Shapley values.
|KernelSHAP, Shapley sampling values|
|TreeSHAP, ASV, aas_explaining_2019|
Methods based on the same value function can differ in their mathematical properties based on the assumptions and computational methods employed for approximation. TreeSHAP lundberg2018consistentby observing what proportion of the samples in the training set matching the condition fall into each leaf node, a method which does not rely on a feature independence assumption. In the algorithms for KernelSHAP lundberg2017unified conditional expectations are estimated by assuming feature independence; samples of the features in are drawn from the marginal distribution of each variable. In the limit, this approximates an expectation over interventional distribution instead, like QII.
In Table 1, we categorize each method based on how they define a value function and how they estimate that value function . In the rest of the paper, we will refer to these value functions as either interventional or conditional based on the estimation method. That is to say, KernelSHAP, Shapley sampling values, QII, and FAE are interventional methods, while TreeSHAP as well as some other algorithms we will introduce later are conditional.
3 Mathematical issues
We now present a number of mathematical problems that arise when we attempt to use Shapley-value-based feature importance measures. These problems arise from the estimation procedures that are in use as well as the fundamental axiomatic structure of Shapley values.
3.1 Conditional versus interventional distributions
A fundamental difference between the interventional and conditional value functions is revealed by what we call the indirect influence debate. Suppose is defined with domain , but for a certain feature , whenever for all ; that is to say, intervening on the value of alone does not change the output of . We call this a variable with no interventional effect.
Should a feature with no interventional effect be considered an “input” to this function? We could define a new function with domain to perfectly capture the output, so perhaps not. What if, in the relevant input space, is a statistical proxy for some which does affect the output of ? Shapley value based feature importance methods must grapple with these choices.
adler2018auditing take the information-theoretic position that “the information content of a feature can be estimated by trying to predict it from the remaining features.” This perspective can help diagnose situations where an undesirable proxy variable is being used by a model, as in the classic case of redlining. While adler2018auditing go on to analyze how the accuracy of a model depends on indirect information, the conditional value function aligns with this information-theoretic principle as well: If a certain feature can help predict the features in , then the quantities and may be meaningfully different, meaning that the marginal contribution of feature is nonzero. For this reason the Shapley value of the conditional value function may attribute influence to features with no interventional effect, a positive thing from the perspective of adler2018auditing.
merrick2019explanation, on the other hand, criticize the capacity to attribute indirect influence as being paradoxical, and show that interventional methods will never attribute attribute influence to an which has no interventional effect on , which they see as a desirable property.
Unfortunately, the decision between the two types of value functions is a catch-22. Both methods introduce serious issues: Choosing a conditional method requires further modeling of how the features are interrelated, which we describe in 3.1.1, while choosing an interventional method induces an “out-of-distribution” problem which we address in 3.1.2.
3.1.1 Issues with conditional distributions
The conditional value function induces two major difficulties. First, the exact computation of the Shapley value for a conditional value function would require knowledge of different multivariate distributions, and so a significant amount of approximation or modeling is necessary. Second, since influence can be computed on an arbitrarily large set of features, it becomes necessary to choose a set that is meaningful because the explanations may change based on which features are considered.
Solutions have been proposed to deal with the computational complexity of this problem. The TreeSHAP algorithm estimates the conditional expectations of any tree ensemble directly, without sampling, using information computed during model training. The algorithm utilizes information about the training instances which fall into each leaf node to model each conditional distribution. It is not, however, set up to attribute influence to variables without an interventional effect, as the trees contain no information about the distribution of variables not in the model.
For arbitrary types of models, estimating the conditional expectations requires a substantial amount of additional modeling of relationships in the data which are not necessarily captured by the model that one is trying to explain. aas_explaining_2019 and frye2019asymmetric have developed methods that aim to generate in-distribution samples for the relevant calculations.
Even if computational issues are resolved, there are additional inconsistencies introduced by the capacity of the Shapley value to attribute influence to an arbitrarily large feature set given a single function. The modeler must decide which features count as players in the cooperative game and which are redundant, and since the problem definition posits that the attributions add up to the value of , this choice can affect the resulting explanations.
Consider the addition of a redundant variable to a dataset with two features, and , so that . Suppose a model is trained on all three features. Intuitively, the features and should be equally informative to the model and so should have the same Shapley value under the conditional value function. Formally, the following properties will hold:
so this means and . Therefore, for any data instance ,
Now consider what would happen if we defined a new function . For any data instance, since , . It is effectively the same model for all in-distribution data points, so the games and are the same for all subsets of variables. Yet if we choose to limit the scope of our explanation to two variables instead of three, the attribution for both and will come out to be different:
Notice that is neither equal to , its assigned influence in the 3-variable setting, nor , the “total” influence of the two identical variables in the 3-variable setting. The relative apparent importances of and thus depend on whether is considered to be a third feature, even though the two functions are effectively the same.
It is not obvious whether two statistically related features should be considered as separate “players” in the cooperative game, yet this choice has an impact on the output of these additive explanation models. Suppose, for instance, that is a sensitive feature, and is a non-sensitive feature that happens to perfectly correlate with it. Two different “fairness” audits of the same function would come out with quantitatively different results.
frye2019asymmetric propose to a solution to the problem in terms of incorporating causal knowledge:
…If is known to be the deterministic causal ancestor of , one might want to attribute all the importance to and none to .
They propose not only discounting fully redundant variables which are causal descendants of other variables in the model, but relaxing the symmetry axiom which uniquely defines the Shapley value. Instead of averaging marginal contributions over every permutation, they suggest defining a quasivalue which considers only certain permutations; for example, orderings which place causal ancestors before their descendants.
In this framework, fully redundant features will receive zero attribution and will not change the resulting value of the remaining features. For instance, in the above example, if variable were known to be a causal descendant of , the Asymmetric Shapley Values of and under will be the same as they were under .
A fully specified causal model is not required to use this method: they “span the data-agnosticism continuum in the sense that they allow any knowledge about the data, however incomplete, to be incorporated into an explanation of the model’s behaviour.” The results in frye2019asymmetric demonstrate, however, the sensitivity of the game theoretic approach to the amount of prior knowledge about the relative agency of each feature, which we consider a significant limitation of the approach.
There are thus both practical and epistemological challenges with computing the Shapley values of games with a conditional value function.
3.1.2 Issues with interventional distributions
Conditional value functions introduce undesirable complexities to the feature importance problem, so those inclined against methods with the capacity for attributing indirect influence may prefer the methods interventional value functions instead. These methods, however, are highly sensitive to properties of the model which are not relevant to what it has learned about the data it was trained on.
Methods which use an interventional value function fundamentally rely on evaluating a model on out-of-distribution samples. Consider, for example, a model trained on a data set with three features: and , both , and an engineered feature . To calculate for some , we would have to estimate over some distribution for which does not depend on or . Therefore we will almost certainly have to evaluate on some sample which does not respect - thus, it is well outside the domain of the actual data distribution. The model has never seen an example like this in training, and has therefore not learned much about this part of the feature space. Its predictions on this feature space are effectively meaningless, yet the explanations will be affected by them.
This “out-of-distribution” phenomenon has been explored recently by pleasestop, who show why “permutation-based” methods to evaluate feature importance can be highly misleading: when values are substituted into feature set that are unlikely or impossible when conditioned on feature set , the model is forced to extrapolate to an unseen part of the feature space. They show that these feature importance methods are highly sensitive to the way in which the model extrapolates to these edge cases, which is undesirable information for a model “explanation” to capture.
SlackHilgard2020FoolingLIMESHAP demonstrate how to exploit this sensitivity by devising models which illegally discriminate on some protected feature for in-distribution samples, but exhibit different behavior on the out-of-distribution samples used by KernelSHAP so as to simulate “fairness” in the resulting explanations. By manipulating the model’s behavior on unfamiliar parts of the feature space, they can twist the explanations on the familiar part to their will.
These challenges illustrate that intervening on a subset of features of a data case before applying a model trained on a sample from a certain distribution is inherently misleading.
3.2 Additivity constraints
In addition to the problems demonstrated above, which have to do with the choice between two families of value functions, we also identify problems which are common to both. These are linked to the axiomatic underpinnings of Shapley values.
For any two of the axioms described in Section 2.1, there exists an alternative attribution between players which satisfies those two but not the other; the Shapley value is therefore only unique because it satisfies all three. Since the notion of the sum of two games is not especially meaningful, the Additivity axiom has been described by game theorists as “mathematically convenient” and “not nearly so innocent as the other two” osborne1994course. The choice to constrain the value to be unique in this way has implications for what kinds of models can be explained intuitively by the Shapley value. Even in simple cases where feature independence renders the interventional versus conditional debate irrelevant, we find the Shapley value conceptually limited for non-additive models.
The Shapley value seems to intuitively align with what is considered important in an additive setting. Consider applying any of the expectation value functions to where the features are independent. For any subset ,
so the marginal contribution for feature is
In this way, the Shapley value is supported by the common intuition that coefficient size, if variables are appropriately scaled, signals importance in a linear model.
The additivity axiom is aligned with additive models in another way: the games resulting from two models sum to the expectation game of the sum of the two models. This seems reasonable when the models are additive in the first place.
Now imagine if the additivity constraint were relaxed. We could use an alternative attribution which satisfies the other two axioms: for and where is the set of dummy features. Using the expectation value function in this setting, any feature which did not satisfy would get the same attribution. In this sense the additivity constraint seems necessary for a game-based feature attribution to provide any meaningful quantities about an additive model.
For non-additive models, unfortunately, Shapley values can be meaningless too. Any value function which always evaluates to 0 except on the grand coalition will evenly distribute influence among players. Consider a model given by where the features are independent and centered at 0. Then for any subset ,
which, since is 0, is always 0 unless . Then the Shapley value for every feature is , regardless of the value . Even if, for instance, the magnitude of one of the variables is much higher than the other. This property will, in fact, hold for all multiplicative functions of independently distributed, zero-centered data. The fact that so many quantitatively different functions share the same explanations is concerning.
Shapley values are touted for their “model-agnostic” quality, but in reality they provide plausible values only when the models are additive.
4 Human-centric issues
The analysis from Section 3 demonstrates the mathematical issues with feature importance methods derived from Shapley values and suggests how one might mitigate them. In this section we turn to the human side of the interaction between feature importance methods and the people who use them. This perspective is closer in spirit to the “human-grounded metrics” that doshi-velez_towards_2017 describe in comparison with the “functionally-grounded evaluation” of the previous section.
We use the framework set out by selbst_intuitive_2018, who argue that there are three general motivations behind the call for explanations in AI.
The first is a fundamental question of autonomy, dignity, and personhood. The second is a more instrumental value: educating the subjects of automated decisions about how to achieve different results. The third is a more normative question—the idea that explaining the model will allow people to debate whether the model’s rules are justifiable.
In this section, we attempt to reconcile the Shapley value feature importance formalization of machine learning “explanations” with these three goals. We argue that the theoretical properties of the Shapley value are not naturally well-suited to any one of these objectives.
4.1 Explanations as contrastive statements
The presence of the phrase “right to explanation” in the GDPR illustrates the sense many of us have that it is inherently unethical to make decisions about an individual without providing an explanation, in a way that selbst_intuitive_2018 argue has more to do with “procedural justice” than “wanting an explanation for the purpose of vindicating certain specific empowerment or accountability goals.”
It is not immediately clear how to formally evaluate a method that provides explanations merely because it should, rather than to improve on a particular metric or task. In this setting, doshi-velez_towards_2017 suggest the empirical approach of running user tests where humans are provided with explanations and they evaluate their “quality”. But in fact, what humans consider a good explanation has been studied extensively in the social sciences, leading to several formal theories of how humans generate and select explanations.
miller2019explanation provides an overview of this literature. One of his major findings is that the way humans explain phenomena to each other is through contrastive statements:
People do not explain the causes for an event per se, but explain the cause of an event relative to some other event that did not occur; that is, an explanation is always of the form “Why P rather than Q?”, in which P is the target event and Q is a counterfactual contrast case that did not occur.
He attributes this insight to work by lipton1990contrastive. More recently, a similar argument has been made by merrick2019explanation, referencing earlier work by kahneman1986norm.
We now outline different ways in which Shapley values can be interpreted as contrastive explanations.
4.1.1 Shapley value sets as a single contrastive statement
The above-mentioned research supports the hypothesis that people ask for explanations when the outcome, P, is “unexpected” compared to the outcome Q. In this sense, we can interpret Shapley-based explanations as a contrastive statement where the outcome to be explained is and the foil – the counterfactual case which did not happen – is implicitly set to be . In the “local” settings described earlier, is and is :
Thus, the Shapley values can be thought of as a set of answers to the question, “Why rather than ?”
While the expected value of a function seems like a natural foil to an “unexpected” , due to the properties of the expectation, there may not be a scenario in the data space of with the outcome . Thus, the expected value may not be “expected” by anyone with a reasonable understanding of the situation at hand at all.
If we are willing to consider intervention distributions (Section 2.2.1), then the framework provided by merrick2019explanation provides a slightly different contrastive explanation: in their setting, the Shapley value assignment can be thought of as a set of answers to the question, “Why rather than ?”, where is chosen from the reference distribution. This of course requires the specification of the reference distribution and carries with it the estimation issues described above in Section 3.1.2.
4.1.2 Marginal contributions as contrastive statements
An alternate way to consider Shapley value-based methods as contrastive statements is by examining the marginal contribution of features. The set of marginal contributions of each feature , which are averaged in a certain way over all subsets to calculate the Shapley value, can be thought of as a set of contrastive explanations. Each quantity represents a contrastive explanation for why feature is important: “Why choose a model with and rather than a model with just ? Because it improves by amount.” This quantity is an important part of stepwise selection, a modeling procedure in which features which increase the accuracy of a model are successively added to the modeling set.
Note that regardless of what order features were actually added to the model in, all permutations are considered when the Shapley value is calculated. It is not clear that taking an average of quantities representing “all possible contrastive explanations” for a certain set of foils is a sensible way to summarize information. Instead, miller2019explanation argues that humans are selective about explanations: certain contrasts are more meaningful than others. An example of this is the difference between necessary and sufficient causes:
Lipton argues that necessary causes are preferred to sufficient causes. For example, consider mutations in the DNA of a particular species of beetle that cause its wings to grow longer than normal when kept in certain temperatures. Now, consider that there are two such mutations, and , and either is sufficient to cause the mutation. To contrast with a beetle whose wings would not change, the explanation of temperature is preferred to either of the mutations or , because neither nor are individually necessary for the observed event; merely that either or . In contrast, the temperature is necessary, and is preferred, even if we know that the cause was .
Consider, without specifying how to quantify the importance of a feature coalition, computing some kind of allocation for each feature to analyze the positive classification of a beetle with longer wings. Lipton’s argument above suggests that since all “yes” cases share a property , a contrastive statement highlighting this is more relevant than comparisons based on or
. This is fundamentally at odds with the idea that the “yes” prediction should be split additively between different coalitions of, and , a property induced by the notion of the Shapley value.
4.2 Using Shapley-valued based methods to enable action
One motivation for “explaining” a function is to enable individuals to figure out how to achieve a desirable outcome. For example, one might allow an individual to query the model for a specific contrastive explanation in which the person ’s outcome, , is compared with a person with desirable outcome determined by the user, such that the user might be able to alter their own situation to approximate . This setup has been formalized as the “counterfactual explanation” problem by wachter2017counterfactual (with an analysis of hidden assumptions by 10.1145/3351095.3372830). ustun2019actionable further specify a way to model this problem by searching for changes within characteristics which are actually mutable; they call this the “actionable recourse” problem (with a corresponding analysis by 10.1145/3351095.3372876).
Unlike these methods, Shapley value based frameworks do not explicitly attempt to provide guidance how a user might alter one’s behavior in a desirable way. Further, observing that a certain feature carries a large influence over the model does not necessarily imply that changing that feature (even significantly) will change the outcome favorably.
Suppose, in a very simple nonlinear example, that a univariate model is defined as , for some . A person for whom will get , and , so the Shapley value for this person’s single input is then . Suppose they were hoping for an even higher score. The fact that the value is positive, along with the general knowledge that is a bit high with respect to an average value of , might make this person think that increasing their value even more will increase their score – but it will not.
This problem stems from the fact that the contrastive quantity is not desirable, but even if is chosen to be some desirable outcome of some , such as in merrick2019explanation, the Shapley values themselves do not correspond to specific actions: the interventional effect of changing one input from to that from is just one of the marginal contributions that are averaged together to form the Shapley value of that input, as we discussed in Section 4.1.2.
4.3 Shapley-based explanations for normative evaluation
Shapley-value-based explanations are primarily used for purposes of normative evaluation: deciding whether a model’s behavior is acceptable bhatt_explainable_2019. This is done either at the development stage, to help a human debug a model or decide whether to deploy it, or at the decision-making stage, to help a human evaluate the quality of a specific decision made by a model. In this section we explore how the information content of the Shapley value is insufficient for evaluation. We marshal evidence to make three points. Firstly, data scientists do not have a clear mental model of what insights Shapley-value-based analysis brings. Secondly, in the face of this uncertainty, they tend to rely on narrative and confirmation biases. Thirdly, even if they do understand the analysis, there is no easy way to operationalize the insights for any specific debugging task.
Since there is no standard procedure for converting Shapley values into a statement about a model’s behavior, developers rely on their own mental model of what the values represent. kaur2019interpreting conducted a contextual inquiry and survey of data scientists to observe their interpretation of interpretability tools including the SHAP Python package. They found that many participants did not have an accurate mental model of what a SHAP analysis represents, yet used them to make decisions on whether the model was ready for deployment, over-trusting and misusing the tool.
Using feature importance in this way is ripe for narrative and confirmation biases. passi_trust_2018
conducted ethnographic fieldwork with a corporate data science team and described situations in which applying intuition to feature importance was a key component of the model development cycle. In one instance, when developers communicated the results of a modeling effort to project managers, the stakeholders immediately decided it was “useful” based entirely on the feature importance list:
Certain highly-weighted features matched business intuitions, and everyone in the meeting considered this a good thing. Models that knew “nothing about the business” had correctly identified certain aspects integral to business practices. …Regarding counter-intuitive feature importances, [a data scientist] reminded [the stakeholders] that machine-learning models do not approach data in the same way humans do. He pointed out that models use “a lot of complex math” to tell us things that we may not know or fully understand.
Even if Shapley values were to provide understandable information about a model, experiments run in poursabzi2018manipulating suggest that it is still not obvious that explanations allow humans to effectively evaluate individual decisions:
Participants who were shown a clear model with a small number of features were better able to simulate the model’s predictions. However, contrary to what one might expect when manipulating interpretability, we found no improvements in the degree to which participants followed the model’s predictions when it was beneficial to do so. Even more surprisingly, increased transparency hampered people’s ability to detect when the model makes a sizable mistake and correct for it, seemingly due to information overload.
There are other concrete questions data scientists might ask:
Whether an error was made at any point in the data processing pipeline for a certain feature
Whether the model is acting upon spurious correlations or other artifacts of training data
Whether the model exhibits inappropriate biases
Whether the model’s accuracy will improve if a certain feature is included or excluded
While Shapley-value-based methods might help qualitatively inform investigations that lead to answers to these questions, they do not provide direct answers to any specific question related to the points of interest above.
Shapley values enjoy mathematically satisfying theoretical properties as a solution to game theory problems. However, applying a game theoretic framework does not automatically solve the problem of feature importance, and our work shows that in fact this framework is ill-suited as a general solution to the problem of quantifying feature importance. Instead, our work suggests that we need more focused approaches that stem from specific use cases and models.
This work was supported in part by the National Science Foundation under grants IIS-1633724, IIS-1633387, and IIS-1815238. Indra Elizabeth Kumar is supported by the ARCS Foundation Utah Chapter through the Noel de Nevers Memorial Scholar Award.