1 Introduction
Explaining a model’s predictions by assigning importance to its inputs (i.e., feature attribution) is critical to many applications in which a user interacts with a model to either make decisions or gain a better understanding of a system (Simonyan et al., 2013; Lundberg and Lee, 2017; Zhou et al., 2016; Shrikumar et al., 2017; Baehrens et al., 2010; Binder et al., 2016; Springenberg et al., 2014; Sundararajan et al., 2017; Fisher et al., 2018; Breiman, 2001). However, correlation among input features presents a challenge when estimating feature importance.
Consider a motivating example adapted from Pearl (2009), in which we are given a model that takes as input four features: the season of the year (), whether or not it’s raining (), whether the sprinkler is on (), and whether the pavement is wet () and outputs a prediction
, representing the probability that the pavement is slippery (capital
denotes a random variable; lower case
denotes a particular sample). Assume, the inputs are related through the causal graph in Figure 1. When assigning feature importance, existing approaches that ignore this causal structure (Janzing et al., 2020; Sundararajan and Najmi, 2019; Datta et al., 2016) may assign zero importance to the season, since it only indirectly affects the outcome through the other input variables. However, such a conclusion may lead a user astray  since changing would most definitely affect the outcome.Recognizing this limitation, researchers have recently proposed approaches that leverage the causal structure among the input variables when assigning credit (Frye et al., 2019). However, such approaches provide an incomplete picture of a system as they assign all credit to the source nodes in a graph. Though this solves the earlier problem of ignoring indirect or upstream effects, it does so by ignoring downstream effects. In our example, season would get all the credit despite the importance of the other variables. This again may lead a user astray  since intervening on or would affect the outcome, yet they are given no credit.
Given that current approaches end up ignoring either downstream (i.e., direct) or upstream (i.e., indirect) effects, we develop Shapley Flow, a comprehensive approach to interpreting a model (or system) that incorporates the causal relationship among input variables, while accounting for both direct and indirect effects. In contrast to prior work, we accomplish this by reformulating the problem as one related to assigning credit to edges in a causal graph, instead of nodes (Figure (c)c). Our key contributions are as follows.

We propose the first (to the best of our knowledge) generalization of Shapley value feature attribution to graphs, providing a complete systemlevel view of a model.

Our approach unifies three previous game theoretic approaches to estimating feature importance.

Through examples on real data, we demonstrate how our approach facilitates understanding.
In this work, we take an axiomatic approach motivated by cooperative game theory, extending Shapley values to graphs. The resulting algorithm, Shapley Flow, generalizes past work in estimating feature importance (Lundberg and Lee, 2017; Frye et al., 2019; López and Saboya, 2009). The estimates produced by Shapley Flow represent the unique allocation of credit that conforms to several natural axioms. Applied to realworld systems, Shapley Flow can help a user understand both the direct and indirect impact of changing a variable, generating insights beyond current feature attribution methods.
2 Problem Setup & Background
Given a model, or more generally a system, that takes a set of inputs and produces an output, we focus on the problem of quantifying the effect of each input on the output. Here, building off previous work, we formalize the problem setting.
2.1 Problem Setup
Quantifying the effect of each input on a model’s output can be formulated as a credit assignment problem. Formally, given a target sample input , a reference sample input , and a model , we aim to explain the difference in output i.e., . We assume and are of the same dimension , and each entry could be either discrete or continuous.
We also assume access to a causal graph, as formally defined in (Peters et al., 2017, Chapter 6), over the input variables. Given this graph, we seek an assignment function that assigns credit to each edge in the causal graph such that they collectively explain the difference . In contrast with the classical setting (Lundberg and Lee, 2017; Sundararajan et al., 2017; Frye et al., 2020; Aas et al., 2019) in which credit is placed on features (ı.e., seeking a node assignment function for ), our edgebased approach is more flexible because we can recover node ’s importance by defining .
Here, the effect of input on output is measured with respect to a reference or background sample. For example, in a healthcare setting, we may set the features in the background sample to values that are deemed typical for a disease. We assume a single background value for notational convenience, but the formalism easily extends to the common scenario of multiple background values or a distribution of background values, , by defining the explanation target to be .
2.2 Feature Attribution with a Causal Graph
Even given a causal graph, feature attribution remains challenging because it is unclear how to rightfully allocate credit for a prediction among the nodes and/or edges of the graph. To address this we generalize game theoretic fairness principles to graphs.
Given a graph, , that consists of a causal graph over the the model of interest and its inputs, we define the boundary of explanation as a cut that partitions the input variables and the output of the model (i.e., the nodes of the graph) into and where source nodes (nodes with no incoming edges) are in and sink nodes (nodes with no outgoing edges) are in . Note that has a single sink, . A cut set is the set of edges with one endpoint in and another endpoint in , denoted as . It is helpful to think of as an alternative model definition, where a boundary of explanation (aka. a model boundary) defines what part of the graph we consider to be the “model”. If we collapse into a single node that subsumes , then represents the direct inputs to this new model.
Depending on the causal graph, multiple boundaries of explanation may exist. Recognizing this multiplicity of choices helps shed light on an ongoing debate in the community regarding feature attribution and whether one should perturb features while staying on the data manifold or perturb them independently (Chen et al., 2020; Janzing et al., 2020; Sundararajan and Najmi, 2019). On one side, many argue that perturbing features independently reveals the functional dependence of the model, and is thus true to the model (Janzing et al., 2020; Sundararajan and Najmi, 2019; Datta et al., 2016). However, independent perturbation of the data can create unrealistic or invalid sets of model input values. Thus, on the other side, researchers argue that one should perturb features while staying on the data manifold, and so be true to the data (Aas et al., 2019; Frye et al., 2019). However, this can result in situations in which features not used by the model are given nonzero attribution. Explanation boundaries help us unify these two viewpoints. As illustrated in Figure (a)a, when we independently perturb features, we assume the causal graph is flat and the explanation boundary lies between and (i.e., contains all of the input variables). In this example, since features are assumed independent all credit is assigned to the features that directly impact the model output, and indirect effects are ignored (no credit is assigned to and ). In contrast, when we perform onmanifold perturbations with a causal structure, as is the case in Asymmetric Shapley Values ( ASV) (Frye et al., 2019), all the credit is assigned to the source node because the source node determines the value of all nodes in the graph (Figure (b)b). This results in a different boundary of explanation, one between the source nodes and the remainder of the graph. Although giving credit does not reflect the true functional dependence of , it does for the model defined by (Figure (c)c). Perturbations that were previously faithful to the data are faithful to a “model”, just one that corresponds to a different boundary. See Appendix 6 for how onmanifold perturbation (without a causal graph) can be unified using explanation boundaries.
Beyond the boundary directly adjacent to the model of interest, , and the boundary directly adjacent to the source nodes, there are other potential boundaries (Figure (c)c) a user may want to consider. However, simply generating explanations for each possible boundary can quickly overwhelm the user (Figures (a)a (b)b (a)a). Our approach sidesteps the issue of selecting a single explanation boundary by considering all explanation boundaries simultaneously. This is made possible by assigning credit to the edges in a causal graph (Figure (c)c). Edge attribution is strictly more powerful than feature attribution because we can simultaneously capture the direct and indirect impact of edges.
While other approaches to assign credit on a graph exist, (e.g., Conductance from Dhamdhere et al. (2018) and DeepLift from Shrikumar et al. (2016)
), they were proposed in the context of understanding internal nodes of a neural network, and depend on implicit linearity and continuity assumptions about the model. We aim to understand the causal structure among the input nodes in a fully model agnostic manner, where discrete variables are allowed, and no differentiability assumption is made. To do this we generalize the widely used Shapley value
(Adadi and Berrada, 2018; Mittelstadt et al., 2019; Lundberg et al., 2018; Sundararajan and Najmi, 2019; Frye et al., 2019; Janzing et al., 2020; Chen et al., 2020) to graphs.3 Proposed Approach: Shapley Flow
Our proposed approach, Shapley Flow, attributes credit to edges of the causal graph. In this section, we present the intuition behind our approach and then formally show that it uniquely satisfies a generalization of the classic Shapley value axioms, while unifying previously proposed approaches.
3.1 Assigning Credit to Edges: Intuition
Given a causal graph defining the relationship among input variables, we reframe the problem of feature attribution to focus on the edges of a graph rather than nodes. Our approach results in edge credit assignments as shown in Figure (c)c. As mentioned above, this eliminates the need for multiple explanations (i.e., bar charts) pertaining to each explanation boundary. Moreover, it allows a user to better understand the nuances of a system by providing information regarding what would happen if a single causal link breaks.
Shapley Flow is the unique assignment of credit to edges such that classic Shapley value axioms are satisfied for all possible boundaries of explanation. Specifically, we extend the efficiency, dummy, and linearity axioms from (Shapley, 1953) and add a new axiom related to boundary consistency. Efficiency states that the attribution of edges on any boundary must add up to . Linearity states that explaining a linear combination of models is the same as explaining each model, and linearly combining the resulting attributions. Dummy states that if adding an edge does not change the output in all scenarios, the edge should be assigned credit. Boundary consistency states that edges shared by different boundaries need to have the same attribution when explained using either boundary. These concepts are illustrated in Figure 4 and formalized in Section 3.3.
An edge is important if removing it causes a large change in the model’s prediction. However, what does it mean to remove an edge? If we imagine every edge in the graph as a channel that sends its source node’s current value to its target node, then removing an edge simply means messages sent through fail. In the context of feature attribution, in which we aim to measure the difference between , this means that ’s target node still relies on the source’s background value in to update its current value, as opposed to the source node’s foreground value in , as illustrated in Figure (a)a. However, we cannot simply toggle edges one at a time. Consider a simple OR function , with , , , . Removing either of the edges alone, would not affect the output and both and would be (erroneously) assigned credit.
To account for this, we consider all scenarios (or partial histories) in which the edge we care about can be added (see Figure (b)b). Here, is a function that takes a list of edges and evaluates the network with edges updated in the order specified by the list. For example, corresponds to the evaluation of when only is updated. Similarly is the evaluation of when is updated followed by . The list is also referred to as a (complete) history as it specifies how changes to .
For the same edge, attributions derived from different explanation boundaries should agree, otherwise simply including more details of a model in the causal graph would change upstream credit allocation, even though the model implementation was unchanged. We refer to this property as boundary consistency. The Shapley Flow value for an edge is the difference in model output when removing the edge averaged over all histories that are boundary consistent (as defined below).
3.2 Model explanation as value assignments in games
The concept of Shapley value stems from game theory, and has been extensively applied in model interpretability (Štrumbelj and Kononenko, 2014; Datta et al., 2016; Lundberg and Lee, 2017; Frye et al., 2019; Janzing et al., 2020). Before we formally extend it to the context of graphs, we define the credit assignment problem from a game theoretic perspective.
Given the message passing system in Section 3.1, we formulate the credit assignment problem as a game specific to an explanation boundary . The game consists of a set of players , and a payoff function . We model each edge external to as a player. A history is a list of edges detailing the event from (values being ) to (values being ). For example, the history means that the edge finishes transmitting a message containing its source node’s most recent value to its target node, followed by the edge , and followed by the edge again. A coalition is a partial history from to any . The payoff function, , associates each coalition with a real number, and is defined in our case as the evaluation of following the coalition.
This setup is a generalization of a typical cooperative game in which the ordering of players does not matter (only the set of players matters). However, given our physical system, history is important. In the following sections, we denote ‘’ as list concatenation, ‘’ as an empty coalition, and as the set of all possible histories. We denote as the set of boundary consistent histories. The corresponding coalitions for and are denoted as and respectively. A sample game setup is illustrated in Figure 3.
3.3 Axioms
We formally extend the classic Shapley value axioms (efficiency, linearity, and dummy) and include one additional axiom, the boundary consistency axiom, that connects all boundaries together.

Boundary consistency: for any two boundaries and , for
For edges that are shared between boundaries, their attributions must agree. In Figure (a)a, the edge wrapped by a teal band is shared by both the blue and green boundaries, forcing them to give the same attribution to the edge.
In the general setting, not all credit assignments are boundary consistent; different boundaries could result in different attributions for the same edge. This occurs when histories associated with different boundaries are inconsistent (Figure (c)c). Moving the boundary from to (where is the boundary with containing ’s inputs), results in a more detailed set of histories. This expansion has constraints. First, any history in the expanded set must follow the physical system in Section 3.1. Second, when a message passes through the boundary, it immediately reaches the end of computation, because is assumed to be a blackbox.
Denoting the history expansion function into as and denoting the set of all boundaries as , a history is boundary consistent if for all such that
That is needs to have a least one fully detailed history in which all boundaries can agree on. is all histories in that are boundary consistent. We rely on this notion of boundary consistency in generalizing the Shapley axioms to any explanation boundary, :

Efficiency: .
In the general case where can depend on the ordering of , the sum is . But when the game is defined by a model function , and . An illustration with boundaries is shown in Figure (a)a.

Linearity: for any payoff functions and and scalars and .
Linearity enables us to compute a linear ensemble of models by independently explaining each model and then linearly weighting the attributions. Similarly, we can explain by independently computing attributions for each baseline sample and then taking the average of the attributions, without recomputing from scratch whenever the background sample’s distribution changes. An illustration with baseline samples is shown in Figure (c)c.

Dummy player: if for all for .
Dummy player states that if an edge does not change the model’s output when added to in all possible coalitions, it should be given attribution. In Figure (b)b, is a dummy edge because starting from any coalition, adding wouldn’t change the output.
These last three axioms are extensions of Shapley’s axioms. Note that we no longer need the symmetry axiom because it is implied by the updated dummy player with history.
3.4 Shapley Flow is the unique solution
Shapley Flow uniquely satisfies all axioms from the previous section. Here, we describe the algorithm, show its formulae, and state its properties. Please refer to Appendix 8 and 7 for proofs and code.
Description: Define a configuration of a graph as an arbitrary ordering of outgoing edges of a node when it is traversed by depth first search. For each configuration, we run depth first search starting from the source node, processing edges in the order of the configuration. When processing an edge, we update the value of the edge’s target node by making the edge’s source node value visible to its function. If the edge’s target node is the sink node, the difference in the sink node’s output is credited to every edge along the search path from source to sink. The final result averages over attributions for all configurations.
Formulae: Denote the attribution of Shapley Flow to a path as , and the set of all possible orderings of source nodes to a sink path generated by depth first search (DFS) as . For each ordering , the inequality of denotes that path precedes path under . Since ’s input is a list of edges, we define to work on a list of paths. The evaluation of on a list of path is the value of evaluated on the corresponding edge traversal ordering. Then
(1) 
To obtain an edge ’s attribution , we sum the path attributions for all paths that contains .
(2) 
Additional properties: Shapley Flow has the following beneficial properties beyond the axioms.
Generalization of SHAP: if the graph is flat, the edge attribution is equal to feature attribution from SHAP because each input node is paired with a single edge leading to the model.
Generalization of ASV: the attribution to the source nodes is the same as in ASV if all the dependencies among features are modeled by the causal graph.
Generalization of Owen value: if the graph is a tree, the edge attribution for incoming edges to the leaf nodes is the Owen value with a coalition structure defined by the tree.
Implementation invariance: boundary consistency is equivalent to ensuring the attributions are invariant to how is implemented or modeled.
Conservation of flow: efficiency and boundary consistency imply that the sum of attributions on a node’s incoming edges equals the sum of its outgoing edges.
Model agnostic: Shapley Flow can explain arbitrary (nondifferentiable) machine learning pipelines.
4 Practical Application
Shapley Flow highlights both the direct and indirect impact of features. In this section, we consider several applications of Shapley Flow. First, in the context of a linear model, we verify that the attributions match our intuition. Second, we show how current feature attribution approaches lead to an incomplete understanding of a system compared to Shapley Flow.
4.1 Experimental Setup
We illustrate the application of Shapley Flow to a synthetic and a benchmark dataset. In addition, we include results for a third dataset in the Appendix. Note that our algorithm assumes a causal graph is provided as input. In recent years there has been significant progress in causal graph estimation (Glymour et al., 2019; Peters et al., 2017). However, since our focus is not on causal inference, we make simplifying assumptions in estimating the causal graphs (see Appendix).
Datasets. Synthetic: As a sanity check, we first experiment with synthetic data. We create a random graph dataset with nodes. A node is randomly connected to node (with pointing to ) with probability if , otherwise
. The function at each node is linear with weights generated from a standard normal distribution. Sources follow a
distribution. This results in a graph with a single sink node associated with function (i.e., the ‘model’ of interest). The remainder of the graph corresponds to the causal structure among the input variables.National Health and Nutrition Examination Survey: This dataset consists of individuals with demographic and laboratory measurements (Cox, 1998). We used the same preprocessing as described by Lundberg et al. (2020). Given these inputs, the model, , aims to predict survival.
Model training. We train using an random train/test split. For experiments with linear models,
is trained with linear regression. For experiments with nonlinear models,
is fitted by XGBoost trees with a max depth of for up to epochs, using the cox loss.Causal Graph. We construct a causal graph based on domain knowledge Figure (b)b. Attributes predetermined at birth (age, race, and sex) are treated as source nodes. Poverty index is placed at the second level because economic status could impact one’s health. Other features have directed edges coming from age, race, sex, and poverty index. Note that the relationship among some features is deterministic. For example, pulse pressure is the difference between systolic and diastolic blood pressure. We add the appropriate causal edges to account for such facts. We also account for when features have natural groupings. For example, transferrin saturation (TS), total iron binding capacity (TIBC), and serum iron are all related to blood iron. Serum albumin and serum protein are both blood protein measures. Systolic and diastolic blood pressure can be grouped into blood pressure. Sedimentation rate and white blood cell counts both measure inflammation. We add these higher level grouping concepts as new latent variables in the graph. The resulting causal structure is an oversimplification of the true causal structure; the relationship between source nodes (e.g., race) and biomarkers is far more complex. Nonetheless, it can help in understanding the in/direct effects of input variables on the outcome.
4.2 Baselines
We compare Shapley Flow with other game theoretic feature attribution methods: independent SHAP (Lundberg and Lee, 2017), onmanifold SHAP (Aas et al., 2019), and ASV (Frye et al., 2019), covering both independent and onmanifold feature attribution.
Since Shapley Flow and all baselines are expensive to compute exactly, we use a Monte Carlo approximation of Equation 1. In particular, we sample orderings from and average across those orderings.We randomly selected a baseline sample from each dataset and share it across methods so that each uses the same baseline. A single baseline allows us to ignore differences in methods due to variations in baseline sampling (A multiple baseline version is easily attainable due to linearity as explained in Figure (c)c). We sample orderings from each approach to generate the results. Since there’s no publicly available implementation for ASV, we show the attribution for source nodes obtained from Shapley Flow (summing attributions of outgoing edges) as they are equivalent given the same causal graph. For convenience of visual inspection, we show top links used by Shapley Flow (credit measured in absolute value) on the nutrition dataset.
4.3 Sanity checks with linear models
To build intuition, we first examine linear models (i.e., where and ; the causal dependence inside the graph is also linear). When using a linear model the ground truth direct impact of changing feature is (that is the change in output due to directly), and the ground truth indirect impact is defined as the change in output when an intervention changes to .
Results for explaining the datasets are included in Table 1. We report the mean absolute error from the ground truth attribution for
randomly selected examples in both datasets across features. We also report the standard deviation of error in parentheses. Note that only Shapley flow results in no error for both direct and indirect effects.
Methods  Nutrition (D)  Synthetic (D)  Nutrition (I)  Synthetic (I) 

Independent  0.0 ( 0.0)  0.0 ( 0.0)  0.8 ( 2.7)  1.1 ( 1.4) 
Onmanifold  1.3 ( 2.5)  0.8 ( 0.7)  0.9 ( 1.6)  1.5 ( 1.5) 
ASV  1.5 ( 3.3)  1.2 ( 1.4)  0.6 ( 1.9)  1.1 ( 1.5) 
Shapley Flow  0.0 ( 0.0)  0.0 ( 0.0)  0.0 ( 0.0)  0.0 ( 0.0) 
4.4 Examples with nonlinear models
We demonstrate the benefits of Shapley Flow with nonlinear models containing both discrete and continuous variables. As a reminder, the baseline methods are not competing with Shapley Flow as the latter can recover all the baselines given the corresponding causal structure (Figure 2). Instead, we highlight why a holistic understanding of the system is better.
Independent SHAP ignores the indirect impact of features. Take an example from the nutrition dataset (Figure 6). The race feature is given low attribution with independent SHAP, but high importance in ASV. This happens because race, in addition to its direct impact, indirectly affects the output through blood pressure, serum magnesium, and blood protein, as shown by Shapley Flow (Figure (a)a). In particular, race partially accounts for the impact of serum magnesium because changing race from Black to White on average increases serum magnesium by meg/L in the dataset (thus partially explaining the increase in serum magnesium changing from the background sample to the foreground). Independent SHAP fails to account for the indirect impact of race, leaving the user with a potentially misleading impression that race is irrelevant for the prediction.
Onmanifold SHAP provides a misleading interpretation. With the same example (Figure 6), we observe that onmanifold SHAP strongly disagrees with independent SHAP, ASV, and Shapley Flow on the importance of age. Not only does it assign more credit to age, it also flips the sign, suggesting that age is protective. However, Figure (a)a shows that age and earlier mortality are positively correlated; then how could age be protective? Figure (b)b provides an explanation. Since SHAP considers all partial histories regardless of the causal structure, when we focus on serum magnesium and age, there are two cases: serum magnesium updates before or after age. We focus on the first case because it is where onmanifold SHAP differs from other baselines (all baselines already consider the second case as it satisfies the causal ordering). When serum magnesium updates before age, the expected age given serum magnesium is higher than the foreground age (yellow line above the black marker). Therefore when age updates to its foreground value, we observe a decrease in age, leading to a decrease in the output (so age appears to be protective). Serum magnesium is just one variable from which age steals credit. Similar logic applies to TIBC, red blood cells, serum iron, serum protein, serum cholesterol, and diastolic BP. From both an in/direct impact perspective, onmanifold perturbation can be misleading since it is based not on causal but on observational relationships.
ASV ignores the direct impact of features. As shown in Figure 6, serum magnesium appears to be more important in independent SHAP compared to ASV. From Shapley Flow (Figure (a)a), this difference is explained by race as its edge to serum magnesium has a negative impact. However, looking at ASV alone, one fails to understand that intervening on serum magnesium could have a larger impact on the output.
Shapley Flow shows both direct and indirect impacts of features. Focusing on the attribution given by Shapley Flow (Figure (a)a). We not only observe similar direct impacts in variables compared to independent SHAP, but also can trace those impacts to their source nodes, similar to ASV. Furthermore, Shapley Flow provides more detail compared to other approaches. For example, using Shapley Flow we gain a better understanding of the ways in which race impacts survival. The same goes for all other features. This is useful because causal links can change (or break) over time. Our method provides a way to reason through the impact of such a change.
We provide more examples and an additional dataset highlighting the utility of Shapley Flow in the Appendix.
Top features  Age  Serum Magnesium  Race 

Background sample  35.0  1.37  Black 
Foreground sample  42.0  1.63  white 
Attributions  Independent  Onmanifold  ASV 

Age  0.23  0.38  0.3 
Serum Magnesium  0.21  0.02  0.15 
Race  0.06  0.04  0.24 
Pulse pressure  0.0  0.08  0.0 
Diastolic BP  0.0  0.08  0.0 
Serum Cholesterol  0.0  0.07  0.0 
Serum Protein  0.01  0.06  0.0 
Serum Iron  0.0  0.05  0.0 
Poverty index  0.02  0.01  0.01 
Systolic BP  0.03  0.01  0.0 
5 Conclusion
We extend the classic Shapley value axioms to causal graphs, resulting in a unique edge attribution method: Shapley Flow. It unifies three previous Shapley value based feature attribution methods, and enables the joint understanding of both the direct and indirect impact of features. This more comprehensive understanding is useful when interpreting any machine learning model, both ‘black box’ methods, and ‘interpretable’ methods.
References
 Explaining individual predictions when features are dependent: more accurate approximations to shapley values. arXiv preprint arXiv:1903.10464. Cited by: §2.1, §2.2, §4.2.

Peeking inside the blackbox: a survey on explainable artificial intelligence (xai)
. IEEE Access 6, pp. 52138–52160. Cited by: §2.2.  How to explain individual classification decisions. The Journal of Machine Learning Research 11, pp. 1803–1831. Cited by: §1.
 Layerwise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, pp. 63–71. Cited by: §1.
 Random forests. Machine learning 45 (1), pp. 5–32. Cited by: §1.
 True to the model or true to the data?. arXiv preprint arXiv:2006.16234. Cited by: §2.2, §2.2.
 Plan and operation of the nhanes i epidemiologic followup study, 1992. National Ctr for Health Statistics. Cited by: §4.1.
 Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pp. 598–617. Cited by: §1, §2.2, §3.2.

How important is a neuron?
. arXiv preprint arXiv:1805.12233. Cited by: §2.2.  All models are wrong but many are useful: variable importance for blackbox, proprietary, or misspecified prediction models, using model class reliance. arXiv preprint arXiv:1801.01489, pp. 237–246. Cited by: §1.
 Shapleybased explainability on the data manifold. arXiv preprint arXiv:2006.01272. Cited by: §2.1.
 Asymmetric shapley values: incorporating causal knowledge into modelagnostic explainability. arXiv preprint arXiv:1910.06358. Cited by: §1, §1, §10.3, §2.2, §2.2, §3.2, §4.2, §9.1.
 Review of causal discovery methods based on graphical models. Frontiers in genetics 10, pp. 524. Cited by: §4.1.
 Feature relevance quantification in explainable ai: a causal problem. In International Conference on Artificial Intelligence and Statistics, pp. 2907–2916. Cited by: §1, §2.2, §2.2, §3.2.
 On the relationship between shapley and owen values. Central European Journal of Operations Research 17 (4), pp. 415. Cited by: §1.
 From local explanations to global understanding with explainable ai for trees. Nature machine intelligence 2 (1), pp. 2522–5839. Cited by: §4.1.
 Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888. Cited by: §2.2.
 A unified approach to interpreting model predictions. In Advances in neural information processing systems, pp. 4765–4774. Cited by: §1, §1, §2.1, §3.2, §4.2.
 Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pp. 279–288. Cited by: §2.2.
 Causality. Cambridge university press. Cited by: Figure 1, §1.
 Elements of causal inference. The MIT Press. Cited by: §2.1, §4.1, §8.
 A value for nperson games. Contributions to the Theory of Games 2 (28), pp. 307–317. Cited by: Figure 4, §3.1.
 Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685. Cited by: §1.
 Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Cited by: §2.2.
 Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §1.
 Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. Cited by: §1.
 Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems 41 (3), pp. 647–665. Cited by: §3.2.
 The many shapley values for model explanation. arXiv preprint arXiv:1908.08474. Cited by: §1, §2.2, §2.2.
 Axiomatic attribution for deep networks. International Conference on Machine Learning. Cited by: §1, §2.1.

Learning deep features for discriminative localization
. InProceedings of the IEEE conference on computer vision and pattern recognition
, pp. 2921–2929. Cited by: §1.
References
 Explaining individual predictions when features are dependent: more accurate approximations to shapley values. arXiv preprint arXiv:1903.10464. Cited by: §2.1, §2.2, §4.2.

Peeking inside the blackbox: a survey on explainable artificial intelligence (xai)
. IEEE Access 6, pp. 52138–52160. Cited by: §2.2.  How to explain individual classification decisions. The Journal of Machine Learning Research 11, pp. 1803–1831. Cited by: §1.
 Layerwise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks, pp. 63–71. Cited by: §1.
 Random forests. Machine learning 45 (1), pp. 5–32. Cited by: §1.
 True to the model or true to the data?. arXiv preprint arXiv:2006.16234. Cited by: §2.2, §2.2.
 Plan and operation of the nhanes i epidemiologic followup study, 1992. National Ctr for Health Statistics. Cited by: §4.1.
 Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pp. 598–617. Cited by: §1, §2.2, §3.2.

How important is a neuron?
. arXiv preprint arXiv:1805.12233. Cited by: §2.2.  All models are wrong but many are useful: variable importance for blackbox, proprietary, or misspecified prediction models, using model class reliance. arXiv preprint arXiv:1801.01489, pp. 237–246. Cited by: §1.
 Shapleybased explainability on the data manifold. arXiv preprint arXiv:2006.01272. Cited by: §2.1.
 Asymmetric shapley values: incorporating causal knowledge into modelagnostic explainability. arXiv preprint arXiv:1910.06358. Cited by: §1, §1, §10.3, §2.2, §2.2, §3.2, §4.2, §9.1.
 Review of causal discovery methods based on graphical models. Frontiers in genetics 10, pp. 524. Cited by: §4.1.
 Feature relevance quantification in explainable ai: a causal problem. In International Conference on Artificial Intelligence and Statistics, pp. 2907–2916. Cited by: §1, §2.2, §2.2, §3.2.
 On the relationship between shapley and owen values. Central European Journal of Operations Research 17 (4), pp. 415. Cited by: §1.
 From local explanations to global understanding with explainable ai for trees. Nature machine intelligence 2 (1), pp. 2522–5839. Cited by: §4.1.
 Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888. Cited by: §2.2.
 A unified approach to interpreting model predictions. In Advances in neural information processing systems, pp. 4765–4774. Cited by: §1, §1, §2.1, §3.2, §4.2.
 Explaining explanations in ai. In Proceedings of the conference on fairness, accountability, and transparency, pp. 279–288. Cited by: §2.2.
 Causality. Cambridge university press. Cited by: Figure 1, §1.
 Elements of causal inference. The MIT Press. Cited by: §2.1, §4.1, §8.
 A value for nperson games. Contributions to the Theory of Games 2 (28), pp. 307–317. Cited by: Figure 4, §3.1.
 Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685. Cited by: §1.
 Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Cited by: §2.2.
 Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §1.
 Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. Cited by: §1.
 Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems 41 (3), pp. 647–665. Cited by: §3.2.
 The many shapley values for model explanation. arXiv preprint arXiv:1908.08474. Cited by: §1, §2.2, §2.2.
 Axiomatic attribution for deep networks. International Conference on Machine Learning. Cited by: §1, §2.1.

Learning deep features for discriminative localization
. InProceedings of the IEEE conference on computer vision and pattern recognition
, pp. 2921–2929. Cited by: §1.
6 Explanation boundary for onmanifold methods without a causal graph
Onmanifold perturbation using conditional expectations can be unified with Shapley Flow using explanation boundaries (Figure (a)a). Here we introduce
as an auxiliary variable that represent the imputed version of
. Perturbing any feature affects all input to the model (, , , ) so that they respect the correlation in the data after the perturbation. When has not been perturbed, treats it as missing for and would sample from the conditional distribution of given nonmissing predecessors. The red edges contain causal links from Figure 1, whereas the black edges are the causal structure used by the onmanifold perturbation method. The credit is equally split among the features because they are all correlated. Again, although giving and credit is not true to , it is true to the model defined by .7 The Shapley Flow algorithm
8 Shapley Flow’s uniqueness proof
Without loss of generality, we can assume has a single source node . We can do this because every node in a causal graph is associated with an independent noise node (Peters et al., 2017, Chapter 6). For deterministic relationships, the function for a node doesn’t depend on its noise. Treating those noise node as a single node, , wouldn’t have changed any boundaries that already exist in the original graph. Therefore we can assume there is a single source node .
8.1 At most one solution satisfies the axioms
Assuming that a solution exists, we show that it must be unique.
Proof.
We adapt the argument from the Shapley value uniqueness proof ^{1}^{1}1https://ocw.mit.edu/courses/economics/14126gametheoryspring2016/lecturenotes/MIT14_126S16_cooperative.pdf, by defining basis payoff functions as carrier games. Choose any boundary , we show here that any game defined on the boundary is unique. We also drop the subscript in the proof as there is no ambiguity. Note that since every edge will appear in some boundary, if all boundary edges are uniquely attributed to, all edges have unique attributions. A carrier game associated with coalition (ordered list) is a game with payoff function such that if coalition starts with (otherwise 0). By dummy player, we know that only the last edge in gets credit and all other edges in the cut set are dummy because a coalition is constructed in order (only adding changes the payoff from to ). Note that must be an edge in the boundary to form a valid game because boundary edges are the only edges that are connected to the model defined by the boundary. Therefore we give credit to edges in the cut set other than (because they are dummy players). By the efficiency axiom, we give credit to where is the set of all possible boundary consistent histories as defined in Section 3.3. This uniquely attributed the boundary edges for this game.
We show that the set of carrier games associated with every coalition that ends in a boundary edge (denoted as ) form basis functions for all payoff functions associated with the system. Recall from Section 3.2 that is the set of boundary consistent coalitions. We show here that payoff value on coalitions from is redundant given . Note that \ represents all the coalitions that do not end in a boundary edge. For \, (using Python’s slice notation on list) because only boundary edges are connected to the model defined by the boundary. Therefore it suffices to show that is linearly independent for . For a contradiction, assume for all , , with some non zero (definition of linear dependence). Let be a coalition with minimal length such that . We have , a contradiction.
Therefore for any we have unique ’s such that . Using the linearity axiom, we have
The uniqueness of and makes the attribution unique if a solution exists. Axioms used in the proof are italicized.
∎
8.2 Shapley Flow satisfy the axioms
Proof.
We first demonstrate how to generate all boundaries. Then we show that Shapley Flow gives boundary consistent attributions. Following that, we look at the set of histories that can be generated by DFS in boundary , denoted as . We show that . Using this fact, we check the axioms one by one.

Every boundary can be “grown” one node at a time from where is the source node: Since the computational graph is a directed acyclic graph (DAG), we can obtain a topological ordering of the nodes in . Starting by including the first node in the ordering (the source node ), which defines a boundary as , we grow the boundary by adding nodes to (removing nodes from ) one by one following the topological ordering. This ordering ensures the corresponding explanation boundary is valid because the cut set only flows from to (if that’s not true, then one of the dependency nodes is not in , which violates topological ordering).
Now we show every boundary can be “grown” in this fashion. In other words, starting from an arbitrary boundary , we can “shrink” one node at a time to by reversing the growing procedure. First note that, must have a node with outgoing edges only pointing to nodes in (if that’s not the case, we have a cycle in this graph because we can always choose to go through edges internal to and loop indefinitely). Therefore we can just remove that node to arrive at a new boundary (now its incoming edges are in the cut set). By the same argument, we can keep removing nodes until , completing the proof.

Shapley Flow gives boundary consistent attributions: We show that every boundary grown has edge attribution consistent with the previous boundary. Therefore all boundaries have consistent edge attribution because the boundary formed by any two boundary’s common set of nodes can be grown into those two boundaries using the property above. Let’s focus on the newly added node from one boundary to the next. Note that a property of depth first search is that every time ’s value is updated, its outgoing edges are activated in an atomic way (no other activation of edges occur between the activation of ’s outgoing edges). Therefore, the change in output due to the activation of new edges occur together in the view of edges upstream of , thus not changing their attributions. Also, since ’s outgoing edges must point to the model defined by the current boundary (otherwise it cannot be a valid topological ordering), they don’t have down stream edges, concluding the proof.

: Since attribution is boundary consistent, we can treat the model as a blackbox and only look at the DFS ordering on the data side. Observe that the edge traversal ordering in DFS is a valid history because a) every edge traversal can be understood as a message received through edge , b) when every message is received, the node’s value is updated, and c) the new node’s value is sent out through every outgoing edge by the recursive call in DFS. Therefore the two side of the equation are at least holding the same type of object.
We first show that . Take , we need to find a history in such that a) can be expanded into and b) for any boundary, there is a history in that boundary that can be expanded into . Let be any history expanded using DFS that is aligned with . To show that every boundary can expand into , we just need to show that the boundaries generated through growing process introduced in the first bullet point can be expanded into . The base case is . There must have an ordering to expand into because is generated by DFS, and that DFS ensures that every edge’s impact on the boundary is propagated to the end of computation before another edge in is traversed. Similarly, for the inductive step, when a new node is added, we just follow the expansion of its previous boundary to reach .
Next we show that . First observe that for history in and history in with , if cannot be expanded into , then because they already have mismatches for histories that doesn’t involve passing through . Assume we do have but . To derive a contradiction, we shrink the boundary one node at a time from , again using the procedure described in the first bullet point. We denote the resulting boundary formed by removing nodes as . Since is assumed to be boundary consistent, there exist such that must be able to expand into . Say the two boundaries differ in node . Note that any update to crosses , therefore its impact must be reached by before another event occurs in . Since all of ’s outgoing edges crosses , any ordering of messages sent through those edges is a DFS ordering from . This means that if can be reached by DFS, so can , violating the assumption. Therefore, and (the latter because can expand into a history that is consistent with all boundaries by first expanding into ). We run the same argument until . This gives a contradiction because in this boundary, all histories can be produced by DFS.

Efficiency: Since we are attributing credit by the change in the target node’s value following a history given by DFS, the target for this particular DFS run is thus . Average over all DFS runs and noting that gives the target . Noting that each update in the target node’s value must flow through one of the boundary edges. Therefore the sum of boundary edges’ attribution equals to the target.

Linearity: For two games of the same boundary and , following any history, the sum of output differences between the two games is the output difference of the sum of the two games, therefore would not differ from . It’s easy to see that extending addition to any linear combination wouldn’t matter.

Dummy player: Since Shapley Flow is boundary consistent, we can just run DFS up to the boundary (treat as a blackbox). Since every step in DFS remains in the coalition because , if an edge is dummy, every time it is traversed through DFS, the output won’t change by definition, thus giving it credit.
∎
Therefore Shapley Flow uniquely satisfies the axioms. We note that efficiency requirement simplifies to when applying it to an actual model because all histories from DFS would lead the target node to its target value. We can prove a stronger claim that actually all nodes would reach its target value when DFS finishes. To see that, we do an induction on a topological ordering of the nodes. The source nodes reaches its final value by definition. Assume this holds for the node. For the node, its parents achieves target value by induction. Therefore DFS would make the parents’ final values visible to this node, thus updating it to the target value.
9 Causal graphs
While the nutrition dataset is introduced in the main text, we describe an additional dataset to further demonstrate the usefulness of Shapley Flow. Moreover, we describe in detail how the causal relationship is estimated. The resulting causal graphs for the nutrition dataset and the income dataset are visualized in Figure 9.
9.1 The Census Income dataset
The Census Income dataset consists of samples with features. The task is to predict whether one’s annual income exceeds . We assume a causal graph, similar to that used by Frye et al. (2019) (Figure (a)a). Attributes determined at birth e.g., sex, native country, and race act as source nodes. The remaining features (marital status, education, relationship, occupation, capital gain, work hours per week, capital loss, work class) have fully connected edges pointing from their causal ancestors. All features have a directed edge pointing to the model.
9.2 Causal Effect Estimation
Given the causal structure described above, we estimate the relationship among variables using XGBoost. More specifically, using an train test split, we use XGBoost to learn the function for each node. If the node value is categorical, we train minimize cross entropy loss. Otherwise, they we minimize mean squared error. Models are fitted by XGBoost trees with a max depth of for up to epochs. Since features are rarely perfectly determined by their dependency node, we add independent noise nodes to account for this effect. That is, each nonsink node is pointed to by a unique noise node that account for the residue effect of the prediction.
Depending on whether the variable is discrete or continuous, we handle the noise differently. For continuous variables, the noise node’s value is the residue between the prediction and the actual value. For discrete variables, we assume the actual value is sampled from the categorical distribution specified by the prediction. Therefore the noise node’s value is any possible random number that could result in the actual value.
10 Additional Results
In this section, we first present additional sanity checks with synthetic data. Then we show additional examples from both the nutrition and income datasets to demonstrate how a complete view of boundaries should be preferable over single boundary approaches.
10.1 Additional Sanity Checks
We include further sanity check experiments in this section. The first sanity check consists of a chain with 4 variables. Each node along the chain is an identical copy of its predecessor and the function to explain only depends on (Figure 10). The dataset is created by sampling , that is a standard normal distribution, with samples. We use the first sample as baseline, and explain the second sample (one can choose arbitrary samples to obtain the same insights). As shown in Figure 10, independent SHAP fails to show the indirect impact of , , and , ASV fails to show the direct impact of , on manifold SHAP fails to fully capture both the direct and indirect importance of any edge.
Methods  Income  Nutrition  Synthetic 

Independent  0.0 ( 0.0)  0.0 ( 0.0)  0.0 ( 0.0) 
Onmanifold  0.4 ( 0.3)  1.3 ( 2.5)  0.8 ( 0.7) 
ASV  0.4 ( 0.6)  1.5 ( 3.3)  1.2 ( 1.4) 
Shapley Flow  0.0 ( 0.0)  0.0 ( 0.0)  0.0 ( 0.0) 
Methods  Income  Nutrition  Synthetic 

Independent  0.1 ( 0.2)  0.8 ( 2.7)  1.1 ( 1.4) 
Onmanifold  0.4 ( 0.3)  0.9 ( 1.6)  1.5 ( 1.5) 
ASV  0.1 ( 0.1)  0.6 ( 1.9)  1.1 ( 1.5) 
Flow  0.0 ( 0.0)  0.0 ( 0.0)  0.0 ( 0.0) 
The second sanity check consists of linear models as described in 4.3. We include the full result with the income dataset added in Table 2 and Table 3 for direct and indirect effects respectively. The trend for the income dataset algins with the nutrition and synthetic dataset: only Shapley Flow makes no mistake for estimating both direct and indirect impact. Independent Shap only does well for direct effect. ASV only does well for indirect effects (it only reaches zero error when evaluated on source nodes).
10.2 Additional examples
Figure 11 gives an example of applying Shapley Flow and baselines on the income dataset. Note that the attribution to capital gain drops from independent SHAP to onmanifold SHAP and ASV. From Shapley Flow, we know the decreased attribution is due to age and race. More examples are shown in Figure 12 13.
Background sample  Foreground sample  
Age  39  35 
Workclass  Stategov  Federalgov 
EducationNum  13  5 
Marital Status  Nevermarried  Marriedcivspouse 
Occupation  Admclerical  Farmingfishing 
Relationship  Notinfamily  Husband 
Race  White  Black 
Sex  Male  Male 
Capital Gain  2174  0 
Capital Loss  0  0 
Hours per week  40  40 
Country  UnitedStates  UnitedStates 
Independent  Onmanifold  ASV  

EducationNum  0.12  0.11  0.09 
Relationship  0.05  0.06  0.04 
Capital Gain  0.09  0.01  0.03 
Occupation  0.03  0.07  0.02 
Marital Status  0.04  0.05  0.03 
Workclass  0.02  0.03  0.02 
Race  0.01  0.03  0.01 
Age  0.01  0.01  0.02 
Capital Loss  0.0  0.03  0.0 
Country  0.0  0.03  0.0 
Sex  0.0  0.03  0.0 
Hours per week  0.0  0.0  0.0 
Background sample  foreground sample  
Age  39  30 
Workclass  Stategov  Stategov 
EducationNum  13  13 
Marital Status  Nevermarried  Marriedcivspouse 
Occupation  Admclerical  Profspecialty 
Relationship  Notinfamily  Husband 
Race  White  AsianPacIslander 
Sex  Male  Male 
Capital Gain  2174  0 
Capital Loss  0  0 
Hours per week  40  40 
Country  UnitedStates  India 
Independent  Onmanifold  ASV  

Relationship  0.17  0.04  0.13 
Capital Gain  0.22  0.01  0.07 
Occupation  0.1  0.06  0.07 
Marital Status  0.08  0.06  0.07 
Country  0.04  0.07  0.07 
Age  0.0  0.02  0.13 
EducationNum  0.0  0.12  0.0 
Race  0.02  0.07  0.0 
Workclass  0.0  0.06  0.0 
Hours per week  0.0  0.03  0.0 
Sex  0.0  0.03  0.0 
Capital Loss  0.0  0.01  0.0 
Background sample  Foreground sample  
Age  39  30 
Workclass  Stategov  Federalgov 
EducationNum  13  10 
Marital Status  Nevermarried  Marriedcivspouse 
Occupation  Admclerical  Admclerical 
Relationship  Notinfamily  Ownchild 
Race  White  White 
Sex  Male  Male 
Capital Gain  2174  0 
Capital Loss  0  0 
Hours per week  40  40 
Country  UnitedStates  UnitedStates 
Independent  Onmanifold  ASV  

Marital Status  0.03  0.08  0.03 
Capital Gain  0.06  0.02  0.02 
Workclass  0.03  0.03  0.02 
Relationship  0.01  0.11  0.01 
EducationNum  0.02  0.01  0.02 
Age  0.02  0.03  0.01 
Country  0.0  0.03  0.0 
Capital Loss  0.0  0.03  0.0 
Occupation  0.0  0.03  0.0 
Sex  0.0  0.03  0.0 
Race  0.0  0.02  0.0 
Hours per week  0.0  0.0  0.0 
10.3 A global understanding with Shapley Flow
In addition to explaining a particular example, one can explain an entire dataset with Shapley Flow. Specifically, for multiclass classification problems, we take the average of attributions for the probability predicted for the actual class, in accordance with (Frye et al., 2019). A demonstration on the income dataset using randomly selected examples is included in Figure 14. As before, we use a single shared background sample for explanation. Here, we observe that although the relative importance across independent SHAP, onmanifold SHAP, and ASV are similar, age and sex have opposite direct versus indirect impact as shown by Shapley Flow.
Independent  Onmanifold  ASV  

Capital Gain  0.02  0.02  0.03 
EducationNum  0.02  0.03  0.02 
Age  0.01  0.01  0.01 
Occupation  0.0  0.01  0.0 
Capital Loss  0.01  0.0  0.01 
Relationship  0.01  0.0  0.0 
Hours per week  0.0  0.01  0.0 
Sex  0.0  0.01  0.0 
Country  0.0  0.01  0.0 
Marital Status  0.0  0.0  0.0 
Race  0.0  0.01  0.0 
Workclass  0.0  0.0  0.0 
Comments
There are no comments yet.