1 Introduction
Measuring importance and the attribution of various gains is a central problem in many practical aspects of machine learning such as explainability [31], feature selection [6], data valuation [17], ensemble pruning [34] and federated learning [46, 12]. For example, one might ask: What is the importance of a certain feature in the decisions of a machine learning model? How much is an individual data point worth? Which models are the most valuable in an ensemble? These questions have been addressed in different domains using specific approaches. Interestingly, there is also a general and unified approach to these questions as a solution to a transferable utility (TU) cooperative game. In contrast with other approaches, solution concepts of TU games are theoretically motivated with axiomatic properties. The best known solution of this type is the Shapley value [36] characterized by several desiderata that include fairness, symmetry, and efficiency [4].
In the TU setting, a cooperative game consists of: a player set and a scalarvalued characteristic function that defines the value of coalitions
(subsets of players). In such a game, the Shapley value offers a rigorous and intuitive way to distribute the collective value (e.g. the revenue, profit, or cost) of the team across individuals. To apply this idea to machine learning, we need to define two components: the player set and the characteristic function. In a machine learning setting
players may be represented by a set of input features, reinforcement learning agents, data points, models in an ensemble, or data silos. The characteristic function can then describe the goodness of fit for a model, reward in reinforcement learning, financial gain on instance level predictions, or outofsample model performance. We provide an example about model valuation in an ensemble [34] in Figure 1.Present work. We introduce basic definitions of cooperative games and present the Shapley value, a solution concept that can allocate gains in these games to individual players. We discuss its properties and emphasize why these are important in machine learning. We overview applications of the Shapley value in machine learning: feature selection, data valuation, explainability, reinforcement learning, and model valuation. Finally, we discuss the limitations of the Shapley value and point out future directions.
2 Background
This section introduces cooperative games and the Shapley value followed by its properties. We also provide an illustrative running example for our definitions.
2.1 Cooperative games and the Shapley value
Definition 1.
Player set and coalitions. Let be the finite set of players. We call each nonempty subset a coalition and itself the grand coalition.
Definition 2.
Cooperative game. A TU game is defined by the pair where is a mapping called the characteristic function or the coalition function of the game assigning a real number to each coalition and satisfying .
Example 1.
Let us consider a 3player cooperative game where . The characteristic function defines the payoff for each coalition. Let these payoffs be given as:
Definition 3.
Set of feasible payoff vectors.
Let us define the set of feasible payoff vectors for the cooperative game .Definition 4.
Solution concept and solution vector. Solution concept is a mapping associating a subset to every TU game . A solution vector to the cooperative game satisfies solution concept if . Solution concept is singlevalued if for every the set is a singleton.
A solution concept defines an allocation principle through which rewards can be given to the individual players. The sum of these rewards cannot exceed the value of the grand coalition . Solution vectors are specific allocations satisfying the principles of the solution concept.
Definition 5.
Permutations of the player set. Let be the set of all permutations defined on , a specific permutation is written as and is the position of player in permutation .
Definition 6.
Predecessor set. Let the set of predecessors of player in permutation be the coalition:
Let us imagine that the permutation of the players in our illustrative game is . Under this permutation the predecessor set of the player is , that of the player is and .
Definition 7.
The Shapley value of a player is the average marginal contribution of the player to the value of the predecessor set over every possible permutation of the player set. Table 1 contains manual calculations of the players’ marginal contributions to each permutation and their Shapley values in Example 1.
Marginal Contribution  

Permutation  Player 1  Player 2  Player 3 
(1, 2, 3)  7  11  7 
(1, 3, 2)  7  4  14 
(2, 1, 3)  7  11  7 
(2, 3, 1)  2  11  12 
(3, 1, 2)  7  4  14 
(3, 2, 1)  2  9  14 
Shapley value 
2.2 Properties of the Shapley value
We define the solution concept properties that characterize the Shapley value and emphasize their relevance and meaning in a feature selection game. In this game input features are players, coalitions are subsets of features and the payoff is a scalar valued goodness of fit for a machine learning model using these input features.
Definition 8.
Null player. Player is called a null player if . A solution concept satisfies the null player property if for every game , every , and every null player it holds that .
In the feature selection game a solution concept with the null player property assigns zero value to those features that never increase the goodness of fit when added to the feature set.
Definition 9.
Efficiency. A solution concept is efficient or Pareto optimal if for every game and every solution vector it holds that .
Consider the goodness of fit of the model trained by using the whole set of input features. The importance measures assigned to individual features by an efficient solution concept sum to this goodness of fit. This allows for quantifying the contribution of individual features to the whole performance of the trained model.
Definition 10.
Symmetry. Two players and are symmetric if . A solution concept satisfies symmetry if for all for all and all symmetric players it holds that .
The symmetry property implies that if two features have the same marginal contribution to the goodness of fit when added to any possible coalition then the importance of the two features is the same. This property is essentially a fair treatment of the input features and results in identical features receiving the same importance score.
Definition 11.
Linearity. A singlevalued solution concept satisfies linearity if for any two games and , and for the solution vector of the TU game given by it holds that
Let us imagine a binary classifier and two sets of data points – on both of these datasets, we can define feature selection games with binary cross entropybased payoffs. The Shapley values of input features in the feature selection game calculated on the pooled dataset would be the same as adding together the Shapley values calculated from the two datasets separately.
These four properties together characterize the Shapley value.
Theorem 1 (Shapley, 1953).
A singlevalued solution concept satisfies the null player, efficiency, symmetry, and linearity properties if and only if it is the Shapley value.
3 Approximations of the Shapley Value
Shapley value computation requires an exponential number of characteristic function evaluations, resulting in exponential time complexity. This is prohibitive in a machine learning context when each evaluation can correspond to training a machine learning model. For this reason, machine learning applications use a variety of Shapley value approximation methods we discuss in this section. In the following discussion denotes an approximated Shapley value for player
3.1 Monte Carlo Permutation Sampling
Monte Carlo permutation sampling for the general class of cooperative games was first proposed by castro2009polynomial castro2009polynomial to approximate the Shapley value in linear time.
As shown in Algorithm 1
, the method performs a samplingbased approximation. At each iteration, a random element from the permutations of the player set is drawn. The marginal contributions of the players in the sampled permutation are scaled down by the number of samples (which is equivalent to taking an average) and added to the approximated Shapley values from the previous iteration. castro2009polynomial castro2009polynomial provide asymptotic error bounds for this approximation algorithm via the central limit theorem when the variance of the marginal contributions is known. maleki2013bounding maleki2013bounding extended the analysis of this sampling approach by providing error bounds when either the variance or the range of the marginal contributions is known via Chebyshev’s and Hoeffding’s inequalities. Their bounds hold for a finite number of samples in contrast to the previous asymptotic bounds.
3.1.1 Stratified Sampling for Variance Reduction
In addition to extending the analysis of Monte Carlo estimation, maleki2013bounding maleki2013bounding demonstrate how to improve the Shapley Value approximation when sampling can be
stratified by dividing the permutations of the player set into homogeneous, nonoverlapping subpopulations. In particular, they show that if the set of permutations can be grouped into strata with similar marginal gains for players, then the approximation will be more precise. Following this, CASTRO2017180 CASTRO2017180 explored stratified sampling approaches using strata defined by the set of all marginal contributions when the player is in a specific position within the coalition. burgess2021approximating burgess2021approximating propose stratified sampling approaches designed to minimize the uncertainty of the estimate via a stratified empirical Bernstein bound.3.1.2 Other Variance Reduction Techniques
Following the stratified approaches of maleki2013bounding,CASTRO2017180,burgess2021approximating maleki2013bounding,CASTRO2017180,burgess2021approximating, illes2019estimation illes2019estimation propose an alternative variance reduction technique for the sample mean. Instead of generating a random sequence of samples, they instead generate a sequence of ergodic but not independent samples, taking advantage of negative correlation to reduce the sample variance. mitchell2021sampling mitchell2021sampling show that other Monte Carlo variance reduction techniques can also be applied to this problem, such as antithetic sampling [30, 35]. A simple form of antithetic sampling uses both a randomly sampled permutation and its reverse. Finally, touati2021bayesian touati2021bayesian introduce a Bayesian Monte Carlo approach to Shapley value calculation, showing that Shapley value estimation can be improved by using Bayesian methods to approximate the Shapley value.
3.2 Multilinear Extension
By inducing a probability distribution over the subsets
where is a random subset that does not include playerand each player is included in a subset with probability
, owen1972multilinear owen1972multilinear demonstrated that the sum over subsets in Definition 7 can also be represented as an integral where . Sampling over therefore provides an approximation method – the multilinear extension. For example, mitchell2021sampling mitchell2021sampling uses the trapezoid rule to sample at fixed intervals while okhrati2021multilinear okhrati2021multilinear proposes incorporating antithetic sampling as a variance reduction technique.3.3 Linear Regression Approximation
In their seminal work lundberg2017unified lundberg2017unified apply Shapley values to feature importance and explainability (SHAP values), demonstrating that Shapley values for TU games can be approximated by solving a weighted least squares optimization problem. Their main insight is the computation of Shapley values by approximately solving the following optimization problem:
(2)  
(3)  
(4) 
The definition of weights in Equation (2) and the objective function in Equation (3) implies the evaluation of for coalitions. To address this lundberg2017unified lundberg2017unified propose approximating this problem subsampling the coalitions. Note that
is higher when coalitions are large or small. covert2021improving covert2021improving extend the study of this method, finding that while SHAP is a consistent estimator, it is not an unbiased estimator. By proposing and analyzing a variation of this method that is unbiased, they conclude that while there is a small bias incurred by SHAP it has a significantly lower variance than the corresponding unbiased estimator. covert2021improving covert2021improving then propose a variance reduction method for SHAP, improving convergence speed by a magnitude through sampling coalitions in pairs with each selected alongside its complement.
4 Machine Learning and the Shapley Value
Application  Reference  Payoff  Approximation  Time 
Feature Selection  [6]  Validation loss  Exact  
[40]  Mutual information  Exact  
[47]  Validation loss  Monte Carlo sampling  
[43]  Training loss  Monte Carlo sampling  
[33]  Validation loss  Monte Carlo sampling  
[19]  Validation loss  Exact  
Data Valuation  [22]  Validation loss  Restricted Monte Carlo sampling  
[17]  Validation loss  Monte Carlo sampling  
[38]  Validation loss  Exact  
[10]  Validation loss  Restricted Monte Carlo sampling  
[25]  Validation loss  Monte Carlo sampling  
[26]  Validation loss  Monte Carlo sampling  
Federated Learning  [29]  Validation loss  Monte Carlo sampling  
Universal Explainability  [31]  Attribution  Linear regression  
[41]  Interaction attribution  Integrated gradients  
[42]  Interaction attribution  Integrated gradients  
[14]  Attribution  Linear regression  
[15]  Attribution  Linear regression  
[50]  Attribution  Monte Carlo sampling  
[8]  Attribution  Linear regression  
Explainability of Deep Learning 
[5]  Attribution  Restricted Monte Carlo sampling  or 
[2]  Neuron attribution  Voting game  
[18]  Neuron attribution  Monte Carlo sampling  
[51]  Interaction Attribution  Linear regression  
Explainability of Graphical Models  [28]  Attribution  Exact  
[21]  Causal Attribution  Linear regression  
[45]  Causal Attribution  Linear regression  
[39]  Causal Attribution  Linear regression  
Explainability in Graph Machine Learning  [50]  Edge level attribution  Monte Carlo sampling  
[11]  Edge level attribution  Linear regression  
Multiagent Reinforcement Learning  [44]  Global reward  Monte Carlo sampling  
[27]  Global reward  Monte Carlo sampling  
Model Valuation in Ensembles  [34]  Predictive performance  Voting game 
Our discussion about applications of the Shapley value in the machine learning domain focuses on the formulation of the cooperative games, definition of the player set and payoffs, Shapley value approximation technique used, and the time complexity of the approximation. We summarized the most important application areas with this information in Table 2 and grouped the relevant works by the problem solved.
4.1 Feature Selection
The feature selection game treats input features of a machine learning model as players and model performance as the payoff [20, 16]. The Shapley values of features quantify how much individual features contribute to the model’s performance on a set of data points.
Definition 12.
Feature selection game. Let the player set be , for the train and test feature vector sets are and . Let be a machine learning model trained using as input, then the payoff is where is a goodness of fit function, y and are the ground truth and predicted targets.
Shapley values, and close relatives such as the Banzhaf index [3], have been studied as a measure of feature importance in various contexts [6, 40, 47, 43]
. Using these importance estimates, features can be ranked and selected or removed accordingly. This approach has been applied to various tasks such as vocabulary selection in natural language processing
[33] and feature selection in human action recognition [19].4.2 Data valuation
In the data valuation game training set data points are players and the payoff is defined by the goodness of fit achieved by a model on the test data. Computing the Shapley value of players in a data valuation game measures how much data points contribute to the performance of the model.
Definition 13.
Data valuation game. Let the player set be where is the input feature vector and is the target. Given the coalition let be a machine learning model trained on . Let us denote the test set feature vectors and targets as and , given the set of predicted labels is defined as . Then the payoff of a model trained on the data points is where is a goodness of fit metric.
The Shapley value is not the only method for data valuation – earlier works used function utilization [23, 37], leaveoneout testing [7] and core sets [9]. However, these methods fall short when there are fairness requirements from the data valuation technique [22, 17, 26]. Ghorbani proposed a framework of utilizing Shapley value in a datasharing system [17]; jia2019towards jia2019towards advanced this work with more efficient algorithms to approximate the Shapley value for data valuation. The distributional Shapley value has been discussed by ghorbani2020distributional ghorbani2020distributional who argued that keeping privacy is hard during Shapley value computation. Their method calculates the Shapley value over a distribution which solves problems such as lack of privacy. The computation time of this can be reduced as kwon2021efficient kwon2021efficient point out with approximation methods optimized for specific machine learning models.
4.3 Federated learning
A federated learning scenario can be seen as a cooperative game by modeling the data owners as players who cooperate to train a highquality machine learning model [29].
Definition 14.
Federated learning game. In this game players are a set of labeled dataset owners where and are the feature and label sets owned by the silo. Let be a labeled test set, a coalition of data silos, a machine learning model trained on , and the labels predicted by on . The payoff of is where is a goodness of fit metric.
The system described by liu2021gtg liu2021gtg uses Monte Carlo sampling to approximate the Shapley value of data coming from the data silos in linear time. Given the potentially overlapping nature of the datasets, the use of configuration games could be an interesting future direction [1].
4.4 Explainable machine learning
In explainable machine learning the Shapley value is used to measure the contributions of input features to the output of a machine learning model at the instance level. Given a specific data point, the goal is to decompose the model prediction and assign Shapley values to individual features of the instance. There are universal solutions to this challenge that are model agnostic and designs customized for deep learning [5, 2], classification trees [31], and graphical models [28, 39].
4.4.1 Universal explainability
A cooperative game for universal explainability is completely model agnostic; the only requirement is that a scalarvalued output can be generated by the model such as the probability of a class label being assigned to an instance.
Definition 15.
Universal explainability game. Let us denote the machine learning model of interest with and let the player set be the feature values of a single data instance: . The payoff of a coalition in this game is the scalar valued prediction calculated from the subset of feature values.
Calculating the Shapley value in a game like this offers a complete decomposition of the prediction because the efficiency
axiom holds. The Shapley values of feature values are explanatory attributions to the input features and missing input feature values are imputed with a reference value such as the mean computed from multiple instances
[31, 8]. The pioneering Shapley valuebased universal explanation method SHAP [31] proposes a linear time approximation of the Shapley values which we discussed in Section 3. This approximation has shortcomings and implicit assumptions about the features which are addressed by newer Shapley valuebased explanation techniques. For example, in [14] the input features are not necessarily independent, [15] restricts the permutations based on known causal relationships, and in [8] the proposed technique improves the convergence guarantees of the approximation. Several methods generalize SHAP beyond feature values to give attributions to firstorder feature interactions [42, 41]. However, this requires that the player set is redefined to include feature interaction values.4.4.2 Deep learning
In neuron explainability games neurons are players and attributions to the neurons are payoffs. The primary goal of Shapley valuebased explanations in deep learning is to solve these games and compute attributions to individual neurons and filters [18, 2].
Definition 16.
Neuron explainability game. Let us consider
the encoder layer of a neural network and
x the input feature vector to the encoder. In the neuron explainability game the player set is  each player corresponds to the output of a neuron in the final layer of the encoder. The payoff of coalition is defined as the predicted output where is the head layer of the neural network.In practical terms, the payoffs are the output of the neural network obtained by masking out certain neurons. Using the Shapley values obtained in these games the value of individual neurons can be quantified. At the same time, some deep learning specific Shapley valuebased explanation techniques have designs and goals that are aligned with the games described in universal explainability. These methods exploit the structure of the input data [5] or the nature of feature interactions [51] to provide efficient computations of attributions.
4.4.3 Graphical models
Compared to universal explanations the graphical modelspecific techniques restrict the admissible set of player set permutations considered in the attribution process. These restrictions are defined based on known causal relations and permutations are generated by various search strategies on the graph describing the probabilistic model [21, 28, 39]. Methods are differentiated from each other by how restrictions are defined and how permutations are restricted.
4.4.4 Relational machine learning
In the relational machine learning domain the Shapley value is used to create edge importance attributions of instancelevel explanations [11, 50]. Essentially the Shapley value in these games measures the average marginal change in the outcome variable as one adds a specific edge to the edge set in all of the possible edges set permutations. It is worth noting that the edge explanation and attribution techniques proposed could be generalized to provide node attributions.
Definition 17.
Relational explainability game. Let us define a graph where and are the vertex and edge sets. Given the relational machine learning model , node feature matrix X, node , the payoff of coalition in the graph machine learning explanation game is defined as the node level prediction .
4.5 Multiagent reinforcement learning
Global reward multiagent reinforcement learning problems can be modeled as TU games [44, 27] by defining the player set as the set of agents and the payoff of coalitions as a global reward. The Shapley value allows an axiomatic decomposition of the global reward achieved by the agents in these games and the fair attribution of credit assignments to each of the participating agents.
4.6 Model valuation in ensembles
The Shapley value can be used to assess the contributions of machine learning models to a composite model in ensemble games. In these games, players are models in an ensemble and payoffs are decided by whether prediction mad by the model are correct.
Definition 18.
Ensemble game. Let us consider a single target  feature instance denoted by . The player set in ensemble games is defined by a set of machine learning models
that operate on the feature set. The predicted target output by the ensemble
is defined as where is a prediction aggregation function. The payoff of is where is a goodness of fit metric.The ensemble games described by [34] are formulated as a special subclass of voting games. This allows the use of precise gamespecific approximation [13] techniques and because of this the Shapley value estimates are obtained in quadratic time and have a tight approximation error. The games themselves are model agnostic concerning the player set – ensembles can be formed by heterogeneous types of machine learning models that operate on the same inputs.
5 Discussion
The Shapley value has a widereaching impact in machine learning, but it has limitations and certain extensions of the Shapley value could have important applications in machine learning.
5.1 Limitations
5.1.1 Computation time
Computing the Shapley value for each player naively in a TU game takes factorial time. In some machine learning application areas such as multiagent reinforcement learning and federated learning where the number of players is small, this is not an issue. However, in large scale data valuation [25, 26], explainability [31], and feature selection [33] settings the exact calculation of the Shapley value is not tractable. In Sections 3 and 4 we discussed approximation techniques proposed to make Shapley value computation possible. In some cases, asymptotic properties of these Shapley value approximation techniques are not well understood – see for example [5].
5.1.2 Interpretability
By definition, the Shapley values are the average marginal contributions of players to the payoff of the grand coalition computed from all permutations [36]. Theoretical interpretations like this one are not intuitive and not useful for nongame theory experts. This means that translating the meaning of Shapley values obtained in many application areas to actions is troublesome [24]. For example in a data valuation scenario: is a data point with a twice as large Shapley value as another one twice as much valuable? Answering a question like this requires a definition of the cooperative game that is interpretable.
5.1.3 Axioms do not hold under approximations
As we discussed most applications of the Shapley value in machine learning use approximations. The fact that under these approximations the desired axiomatic properties of the Shapley value do not hold is often overlooked [42]. This is problematic because most works argue for the use of Shapley value based on these axioms. In our view, this is the greatest unresolved issue in the applications of the Shapley value.
5.2 Future Research Directions
5.2.1 Hierarchy of the coalition structure
The Shapley value has a constrained version called Owen value [32] in which only permutations satisfying conditions defined by a coalition structure  a partition of the player set  are considered. The calculation of the Owen value is identical to that of the Shapley value, with the exception that only those permutations are taken into account where the players in any of the subsets of the coalition structure follow each other. In several realworld data and feature valuation scenarios even more complex hierarchies of the coalition, the structure could be useful. Having a nested hierarchy imposes restrictions on the admissible permutations of the players and changes player valuation. Games with such nested hierarchies are called level structure games in game theory. [48] presents the Winter value a solution concept to level structure games  such games are yet to receive attention in the machine learning literature.
5.2.2 Overlapping coalition structure
Traditionally, it is assumed that players in a coalition structure are allocated in disjoint partitions of the grand coalition. Allowing players to belong to overlapping coalitions in configuration games [1] could have several applications in machine learning. For example in a datasharing  feature selection scenario multiple data owners might have access to the same features  a feature can belong to overlapping coalitions.
5.2.3 Solution concepts beyond the Shapley value
The Shapley value is a specific solution concept of cooperative game theory with intuitive axiomatic properties (Section 2). At the same time it has limitations with respect to computation constraints and interpretability (Sections 3 and 5). Cooperative game theory offers other solution concepts such as the core, nucleolus, stable set, and kernel with their own axiomatizations. For example, the core has been used for model explainability and feature selection [49]. Research into the potential applications of these solution concepts is lacking.
6 Conclusion
In this survey we discussed the Shapley value, examined its axiomatic characterizations and the most frequently used Shapley value approximation approaches. We defined and reviewed its uses in machine learning, highlighted issues with the Shapley value and potential new application and research areas in machine learning.
References
 [1] (2006) Configuration Values: Extensions of the Coalitional Owen Value. Games and Economic Behavior 57 (1), pp. 1–17. Cited by: §4.3, §5.2.2.
 [2] (2019) Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation. In International Conference on Machine Learning, pp. 272–281. Cited by: §4.4.2, §4.4, Table 2.
 [3] (1964) Weighted Voting Doesn’t Work: A Mathematical Analysis. Rutgers L. Rev. 19, pp. 317. Cited by: §4.1.

[4]
(2011)
Computational Aspects of Cooperative Game Theory.
Synthesis Lectures on Artificial Intelligence and Machine Learning
5 (6), pp. 1–168. Cited by: §1.  [5] (2018) LShapley and CShapley: Efficient Model Interpretation for Structured Data. In International Conference on Learning Representations, Cited by: §4.4.2, §4.4, Table 2, §5.1.1.
 [6] (2007) Feature Selection via Coalitional Game Theory. Neural Computation 19 (7), pp. 1939–1961. Cited by: §1, §4.1, Table 2.
 [7] (1977) Detection of influential observation in linear regression. Technometrics 19 (1), pp. 15–18. Cited by: §4.2.
 [8] (2021) Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. In International Conference on Artificial Intelligence and Statistics, pp. 3457–3465. Cited by: §4.4.1, Table 2.
 [9] (2009) Sampling algorithms and coresets for ell_p regression. SIAM Journal on Computing 38 (5), pp. 2060–2078. Cited by: §4.2.
 [10] (2021) Explanations for Data Repair Through Shapley Values. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 362–371. Cited by: Table 2.
 [11] (2021) GraphSVX: shapley value explanations for graph neural networks. In Machine Learning and Knowledge Discovery in Databases., pp. 302–318. Cited by: §4.4.4, Table 2.
 [12] (2021) Improving Fairness for Data Valuation in Federated Learning. arXiv preprint arXiv:2109.09046. Cited by: §1.
 [13] (2008) A Linear Approximation Method for the Shapley Value. Artificial Intelligence 172 (14), pp. 1673–1699. Cited by: §4.6.
 [14] (2020) Shapley Explainability on the Data Manifold. In International Conference on Learning Representations, Cited by: §4.4.1, Table 2.
 [15] (2020) Asymmetric Shapley Values: Incorporating Causal Knowledge Into ModelAgnostic Explainability. Advances in Neural Information Processing Systems 33. Cited by: §4.4.1, Table 2.
 [16] (2021) Shapley Values for Feature Selection: the Good, the Bad, and the Axioms. arXiv preprint arXiv:2102.10936. Cited by: §4.1.
 [17] (2019) Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning, pp. 2242–2251. Cited by: §1, §4.2, Table 2.
 [18] (2020) Neuron Shapley: Discovering the Responsible Neurons. In Advances in Neural Information Processing Systems, pp. 5922–5932. Cited by: §4.4.2, Table 2.
 [19] (2021) CGA: a new feature selection model for visual human action recognition. Neural Computing and Applications 33 (10), pp. 5267–5286. Cited by: §4.1, Table 2.
 [20] (2003) An Introduction to Variable and Feature Selection. Journal of machine learning research 3 (Mar), pp. 1157–1182. Cited by: §4.1.
 [21] (2020) Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models. Advances in Neural Information Processing Systems 33. Cited by: §4.4.3, Table 2.
 [22] (2019) Towards Efficient Data Valuation Based on the Shapley Value. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1167–1176. Cited by: §4.2, Table 2.
 [23] (2017) Understanding blackbox predictions via influence functions. In International Conference on Machine Learning, pp. 1885–1894. Cited by: §4.2.
 [24] (2020) Problems with ShapleyValueBased Explanations as Feature Importance Measures. In International Conference on Machine Learning, pp. 5491–5500. Cited by: §5.1.2.
 [25] (2021) Efficient Computation and Analysis of Distributional Shapley Values. In International Conference on Artificial Intelligence and Statistics, pp. 793–801. Cited by: Table 2, §5.1.1.
 [26] (2021) Beta Shapley: a Unified and Noisereduced Data Valuation Framework for Machine Learning. arXiv preprint arXiv:2110.14049. Cited by: §4.2, Table 2, §5.1.1.
 [27] (2021) Shapley Counterfactual Credits for MultiAgent Reinforcement Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 934–942. Cited by: §4.5, Table 2.
 [28] (2020) Shapley Values and MetaExplanations for Probabilistic Graphical Model Inference. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 945–954. Cited by: §4.4.3, §4.4, Table 2.
 [29] (2021) GTGShapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning. arXiv preprint arXiv:2109.02053. Cited by: §4.3, Table 2.
 [30] (2019) Antithetic and Monte Carlo Kernel Estimators for Partial Rankings. Statistics and Computing 29 (5), pp. 1127–1147. Cited by: §3.1.2.
 [31] (2017) A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777. Cited by: §1, §4.4.1, §4.4, Table 2, §5.1.1.
 [32] (1977) Values of Games with a Priori Unions. In Mathematical Economics and Game Theory, pp. 76–88. Cited by: §5.2.1.
 [33] (2021) GameTheoretic Vocabulary Selection via the Shapley Value and Banzhaf Index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 2789–2798. Cited by: §4.1, Table 2, §5.1.1.
 [34] (2021) The Shapley Value of Classifiers in Ensemble Games. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, pp. 1558–1567. Cited by: §1, §1, §4.6, Table 2.
 [35] (2016) Simulation and the Monte Carlo Method. Vol. 10. Cited by: §3.1.2.
 [36] (1953) A Value for NPerson Games. Contributions to the Theory of Games, pp. 307–317. Cited by: §1, §5.1.2, Definition 7.

[37]
(2018)
Finding influential training samples for gradient boosted decision trees
. In International Conference on Machine Learning, pp. 4577–4585. Cited by: §4.2.  [38] (2021) Online ClassIncremental Continual Learning with Adversarial Shapley Value. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 9630–9638. Cited by: Table 2.
 [39] (2021) Flowbased Attribution in Graphical Models: A Recursive Shapley Approach. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139, pp. 9733–9743. Cited by: §4.4.3, §4.4, Table 2.
 [40] (2012) Feature Evaluation and Selection with Cooperative Game Theory. Pattern recognition 45 (8), pp. 2992–3002. Cited by: §4.1, Table 2.
 [41] (2020) The Shapley Taylor Interaction Index. In International Conference on Machine Learning, pp. 9259–9268. Cited by: §4.4.1, Table 2.
 [42] (2020) The Many Shapley Values for Model Explanation. In International Conference on Machine Learning, pp. 9269–9278. Cited by: §4.4.1, Table 2, §5.1.3.
 [43] (2020) Interpretable Feature Subset Selection: A Shapley Value Based Approach. In IEEE International Conference on Big Data, pp. 5463–5472. Cited by: §4.1, Table 2.
 [44] (2021) SHAQ: Incorporating Shapley Value Theory into QLearning for MultiAgent Reinforcement Learning. arXiv preprint arXiv:2105.15013. Cited by: §4.5, Table 2.
 [45] (2021) Shapley Flow: A GraphBased Approach to Interpreting Model Predictions. In International Conference on Artificial Intelligence and Statistics, pp. 721–729. Cited by: Table 2.
 [46] (2020) A Principled Approach to Data Valuation for Federated Learning. In Federated Learning, pp. 153–167. Cited by: §1.
 [47] (2020) Efficient Nonparametric Statistical Inference on Population Feature Importance Using Shapley Values. In International Conference on Machine Learning, pp. 10282–10291. Cited by: §4.1, Table 2.
 [48] (1989) A Value for Cooperative Games with Levels Structure of Cooperation. International Journal of Game Theory 18 (2), pp. 227–40. Cited by: §5.2.1.
 [49] (202105) If You Like Shapley Then You Will Love the Core. Proceedings of the AAAI Conference on Artificial Intelligence 35 (6), pp. 5751–5759. Cited by: §5.2.3.
 [50] (2021) On Explainability of Graph Neural Networks via Subgraph Explorations. In Proceedings of the 38th International Conference on Machine Learning, pp. 12241–12252. Cited by: §4.4.4, Table 2.
 [51] (2021) Interpreting Multivariate Shapley Interactions in DNNs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 10877–10886. Cited by: §4.4.2, Table 2.