DeepAI
Log In Sign Up

The Shapley Value in Machine Learning

Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. In this paper, we first discuss fundamental concepts of cooperative game theory and axiomatic properties of the Shapley value. Then we give an overview of the most important applications of the Shapley value in machine learning: feature selection, explainability, multi-agent reinforcement learning, ensemble pruning, and data valuation. We examine the most crucial limitations of the Shapley value and point out directions for future research.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/08/2021

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Reinforcement learning in cooperative multi-agent settings has recently ...
03/23/2020

Absolute Shapley Value

Shapley value is a concept in cooperative game theory for measuring the ...
09/26/2018

Towards Game-based Metrics for Computational Co-creativity

We propose the following question: what game-like interactive system wou...
08/07/2021

Game Theory and Machine Learning in UAVs-Assisted Wireless Communication Networks: A Survey

In recent years, Unmanned Aerial Vehicles (UAVs) have been used in field...
05/05/2019

The Game of Tetris in Machine Learning

The game of Tetris is an important benchmark for research in artificial ...
08/12/2020

Predictive and Causal Implications of using Shapley Value for Model Interpretation

Shapley value is a concept from game theory. Recently, it has been used ...
07/20/2017

Pragmatic-Pedagogic Value Alignment

For an autonomous system to provide value (e.g., to customers, designers...

1 Introduction

Measuring importance and the attribution of various gains is a central problem in many practical aspects of machine learning such as explainability [31], feature selection [6], data valuation [17], ensemble pruning [34] and federated learning [46, 12]. For example, one might ask: What is the importance of a certain feature in the decisions of a machine learning model? How much is an individual data point worth? Which models are the most valuable in an ensemble? These questions have been addressed in different domains using specific approaches. Interestingly, there is also a general and unified approach to these questions as a solution to a transferable utility (TU) cooperative game. In contrast with other approaches, solution concepts of TU games are theoretically motivated with axiomatic properties. The best known solution of this type is the Shapley value [36] characterized by several desiderata that include fairness, symmetry, and efficiency [4].

In the TU setting, a cooperative game consists of: a player set and a scalar-valued characteristic function that defines the value of coalitions

(subsets of players). In such a game, the Shapley value offers a rigorous and intuitive way to distribute the collective value (e.g. the revenue, profit, or cost) of the team across individuals. To apply this idea to machine learning, we need to define two components: the player set and the characteristic function. In a machine learning setting

players may be represented by a set of input features, reinforcement learning agents, data points, models in an ensemble, or data silos. The characteristic function can then describe the goodness of fit for a model, reward in reinforcement learning, financial gain on instance level predictions, or out-of-sample model performance. We provide an example about model valuation in an ensemble [34] in Figure 1.

Figure 1: The Shapley value can be used to solve cooperative games. An ensemble game is a machine learning application for it – models in an ensemble are players (red, blue, and green robots) and the financial gain of the predictions is the payoff (coins) for each possible coalition (rectangles). The Shapley value can distribute the gain of the grand coalition (right bottom corner) among models.

Present work. We introduce basic definitions of cooperative games and present the Shapley value, a solution concept that can allocate gains in these games to individual players. We discuss its properties and emphasize why these are important in machine learning. We overview applications of the Shapley value in machine learning: feature selection, data valuation, explainability, reinforcement learning, and model valuation. Finally, we discuss the limitations of the Shapley value and point out future directions.

2 Background

This section introduces cooperative games and the Shapley value followed by its properties. We also provide an illustrative running example for our definitions.

2.1 Cooperative games and the Shapley value

Definition 1.

Player set and coalitions. Let be the finite set of players. We call each non-empty subset a coalition and itself the grand coalition.

Definition 2.

Cooperative game. A TU game is defined by the pair where is a mapping called the characteristic function or the coalition function of the game assigning a real number to each coalition and satisfying .

Example 1.

Let us consider a 3-player cooperative game where . The characteristic function defines the payoff for each coalition. Let these payoffs be given as:

Definition 3.

Set of feasible payoff vectors.

Let us define the set of feasible payoff vectors for the cooperative game .

Definition 4.

Solution concept and solution vector. Solution concept is a mapping associating a subset to every TU game . A solution vector to the cooperative game satisfies solution concept if . Solution concept is single-valued if for every the set is a singleton.

A solution concept defines an allocation principle through which rewards can be given to the individual players. The sum of these rewards cannot exceed the value of the grand coalition . Solution vectors are specific allocations satisfying the principles of the solution concept.

Definition 5.

Permutations of the player set. Let be the set of all permutations defined on , a specific permutation is written as and is the position of player in permutation .

Definition 6.

Predecessor set. Let the set of predecessors of player in permutation be the coalition:

Let us imagine that the permutation of the players in our illustrative game is . Under this permutation the predecessor set of the player is , that of the player is and .

Definition 7.

Shapley value. The Shapley value [36] is a single-valued solution concept for cooperative games. The component of the single solution vector satisfying this solution concept for any cooperative game is given by Equation 1.

(1)

The Shapley value of a player is the average marginal contribution of the player to the value of the predecessor set over every possible permutation of the player set. Table 1 contains manual calculations of the players’ marginal contributions to each permutation and their Shapley values in Example 1.

Marginal Contribution
Permutation Player 1 Player 2 Player 3
(1, 2, 3) 7 11 7
(1, 3, 2) 7 4 14
(2, 1, 3) 7 11 7
(2, 3, 1) 2 11 12
(3, 1, 2) 7 4 14
(3, 2, 1) 2 9 14
Shapley value
Table 1: The permutations of the player set, marginal contributions of the players in each permutation and the Shapley values.

2.2 Properties of the Shapley value

We define the solution concept properties that characterize the Shapley value and emphasize their relevance and meaning in a feature selection game. In this game input features are players, coalitions are subsets of features and the payoff is a scalar valued goodness of fit for a machine learning model using these input features.

Definition 8.

Null player. Player is called a null player if . A solution concept satisfies the null player property if for every game , every , and every null player it holds that .

In the feature selection game a solution concept with the null player property assigns zero value to those features that never increase the goodness of fit when added to the feature set.

Definition 9.

Efficiency. A solution concept is efficient or Pareto optimal if for every game and every solution vector it holds that .

Consider the goodness of fit of the model trained by using the whole set of input features. The importance measures assigned to individual features by an efficient solution concept sum to this goodness of fit. This allows for quantifying the contribution of individual features to the whole performance of the trained model.

Definition 10.

Symmetry. Two players and are symmetric if . A solution concept satisfies symmetry if for all for all and all symmetric players it holds that .

The symmetry property implies that if two features have the same marginal contribution to the goodness of fit when added to any possible coalition then the importance of the two features is the same. This property is essentially a fair treatment of the input features and results in identical features receiving the same importance score.

Definition 11.

Linearity. A single-valued solution concept satisfies linearity if for any two games and , and for the solution vector of the TU game given by it holds that

Let us imagine a binary classifier and two sets of data points – on both of these datasets, we can define feature selection games with binary cross entropy-based payoffs. The Shapley values of input features in the feature selection game calculated on the pooled dataset would be the same as adding together the Shapley values calculated from the two datasets separately.

These four properties together characterize the Shapley value.

Theorem 1 (Shapley, 1953).

A single-valued solution concept satisfies the null player, efficiency, symmetry, and linearity properties if and only if it is the Shapley value.

3 Approximations of the Shapley Value

Shapley value computation requires an exponential number of characteristic function evaluations, resulting in exponential time complexity. This is prohibitive in a machine learning context when each evaluation can correspond to training a machine learning model. For this reason, machine learning applications use a variety of Shapley value approximation methods we discuss in this section. In the following discussion denotes an approximated Shapley value for player

3.1 Monte Carlo Permutation Sampling

Monte Carlo permutation sampling for the general class of cooperative games was first proposed by castro2009polynomial castro2009polynomial to approximate the Shapley value in linear time.

Data: - Cooperative TU game.
                - Number of sampled permutations.
Result: - Approximated Shapley value .
1 for  do
2        for  do
3              
4        end for
5       
6 end for
Algorithm 1 Monte Carlo permutation sampling approximation of the Shapley value.

As shown in Algorithm 1

, the method performs a sampling-based approximation. At each iteration, a random element from the permutations of the player set is drawn. The marginal contributions of the players in the sampled permutation are scaled down by the number of samples (which is equivalent to taking an average) and added to the approximated Shapley values from the previous iteration. castro2009polynomial castro2009polynomial provide asymptotic error bounds for this approximation algorithm via the central limit theorem when the variance of the marginal contributions is known. maleki2013bounding maleki2013bounding extended the analysis of this sampling approach by providing error bounds when either the variance or the range of the marginal contributions is known via Chebyshev’s and Hoeffding’s inequalities. Their bounds hold for a finite number of samples in contrast to the previous asymptotic bounds.

3.1.1 Stratified Sampling for Variance Reduction

In addition to extending the analysis of Monte Carlo estimation, maleki2013bounding maleki2013bounding demonstrate how to improve the Shapley Value approximation when sampling can be

stratified by dividing the permutations of the player set into homogeneous, non-overlapping sub-populations. In particular, they show that if the set of permutations can be grouped into strata with similar marginal gains for players, then the approximation will be more precise. Following this, CASTRO2017180 CASTRO2017180 explored stratified sampling approaches using strata defined by the set of all marginal contributions when the player is in a specific position within the coalition. burgess2021approximating burgess2021approximating propose stratified sampling approaches designed to minimize the uncertainty of the estimate via a stratified empirical Bernstein bound.

3.1.2 Other Variance Reduction Techniques

Following the stratified approaches of maleki2013bounding,CASTRO2017180,burgess2021approximating maleki2013bounding,CASTRO2017180,burgess2021approximating, illes2019estimation illes2019estimation propose an alternative variance reduction technique for the sample mean. Instead of generating a random sequence of samples, they instead generate a sequence of ergodic but not independent samples, taking advantage of negative correlation to reduce the sample variance. mitchell2021sampling mitchell2021sampling show that other Monte Carlo variance reduction techniques can also be applied to this problem, such as antithetic sampling [30, 35]. A simple form of antithetic sampling uses both a randomly sampled permutation and its reverse. Finally, touati2021bayesian touati2021bayesian introduce a Bayesian Monte Carlo approach to Shapley value calculation, showing that Shapley value estimation can be improved by using Bayesian methods to approximate the Shapley value.

3.2 Multilinear Extension

By inducing a probability distribution over the subsets

where is a random subset that does not include player

and each player is included in a subset with probability

, owen1972multilinear owen1972multilinear demonstrated that the sum over subsets in Definition  7 can also be represented as an integral where . Sampling over therefore provides an approximation method – the multilinear extension. For example, mitchell2021sampling mitchell2021sampling uses the trapezoid rule to sample at fixed intervals while okhrati2021multilinear okhrati2021multilinear proposes incorporating antithetic sampling as a variance reduction technique.

3.3 Linear Regression Approximation

In their seminal work lundberg2017unified lundberg2017unified apply Shapley values to feature importance and explainability (SHAP values), demonstrating that Shapley values for TU games can be approximated by solving a weighted least squares optimization problem. Their main insight is the computation of Shapley values by approximately solving the following optimization problem:

(2)
(3)
(4)

The definition of weights in Equation (2) and the objective function in Equation (3) implies the evaluation of for coalitions. To address this lundberg2017unified lundberg2017unified propose approximating this problem subsampling the coalitions. Note that

is higher when coalitions are large or small. covert2021improving covert2021improving extend the study of this method, finding that while SHAP is a consistent estimator, it is not an unbiased estimator. By proposing and analyzing a variation of this method that is unbiased, they conclude that while there is a small bias incurred by SHAP it has a significantly lower variance than the corresponding unbiased estimator. covert2021improving covert2021improving then propose a variance reduction method for SHAP, improving convergence speed by a magnitude through sampling coalitions in pairs with each selected alongside its complement.

4 Machine Learning and the Shapley Value

Application Reference Payoff Approximation Time
Feature Selection [6] Validation loss Exact
[40] Mutual information Exact
[47] Validation loss Monte Carlo sampling
[43] Training loss Monte Carlo sampling
[33] Validation loss Monte Carlo sampling
[19] Validation loss Exact
Data Valuation [22] Validation loss Restricted Monte Carlo sampling
[17] Validation loss Monte Carlo sampling
[38] Validation loss Exact
[10] Validation loss Restricted Monte Carlo sampling
[25] Validation loss Monte Carlo sampling
[26] Validation loss Monte Carlo sampling
Federated Learning [29] Validation loss Monte Carlo sampling
Universal Explainability [31] Attribution Linear regression
[41] Interaction attribution Integrated gradients
[42] Interaction attribution Integrated gradients
[14] Attribution Linear regression
[15] Attribution Linear regression
[50] Attribution Monte Carlo sampling
[8] Attribution Linear regression

Explainability of Deep Learning

[5] Attribution Restricted Monte Carlo sampling or
[2] Neuron attribution Voting game
[18] Neuron attribution Monte Carlo sampling
[51] Interaction Attribution Linear regression
Explainability of Graphical Models [28] Attribution Exact
[21] Causal Attribution Linear regression
[45] Causal Attribution Linear regression
[39] Causal Attribution Linear regression
Explainability in Graph Machine Learning [50] Edge level attribution Monte Carlo sampling
[11] Edge level attribution Linear regression
Multi-agent Reinforcement Learning [44] Global reward Monte Carlo sampling
[27] Global reward Monte Carlo sampling
Model Valuation in Ensembles [34] Predictive performance Voting game
Table 2: An application area, payoff definition, Shapley value approximation technique, and computation time (the player set is noted by ) based comparison of research works. Specific applications of the Shapley value are grouped together and ordered chronologically.

Our discussion about applications of the Shapley value in the machine learning domain focuses on the formulation of the cooperative games, definition of the player set and payoffs, Shapley value approximation technique used, and the time complexity of the approximation. We summarized the most important application areas with this information in Table 2 and grouped the relevant works by the problem solved.

4.1 Feature Selection

The feature selection game treats input features of a machine learning model as players and model performance as the payoff [20, 16]. The Shapley values of features quantify how much individual features contribute to the model’s performance on a set of data points.

Definition 12.

Feature selection game. Let the player set be , for the train and test feature vector sets are and . Let be a machine learning model trained using as input, then the payoff is where is a goodness of fit function, y and are the ground truth and predicted targets.

Shapley values, and close relatives such as the Banzhaf index [3], have been studied as a measure of feature importance in various contexts [6, 40, 47, 43]

. Using these importance estimates, features can be ranked and selected or removed accordingly. This approach has been applied to various tasks such as vocabulary selection in natural language processing 

[33] and feature selection in human action recognition [19].

4.2 Data valuation

In the data valuation game training set data points are players and the payoff is defined by the goodness of fit achieved by a model on the test data. Computing the Shapley value of players in a data valuation game measures how much data points contribute to the performance of the model.

Definition 13.

Data valuation game. Let the player set be where is the input feature vector and is the target. Given the coalition let be a machine learning model trained on . Let us denote the test set feature vectors and targets as and , given the set of predicted labels is defined as . Then the payoff of a model trained on the data points is where is a goodness of fit metric.

The Shapley value is not the only method for data valuation – earlier works used function utilization [23, 37], leave-one-out testing [7] and core sets [9]. However, these methods fall short when there are fairness requirements from the data valuation technique  [22, 17, 26]. Ghorbani proposed a framework of utilizing Shapley value in a data-sharing system [17]; jia2019towards jia2019towards advanced this work with more efficient algorithms to approximate the Shapley value for data valuation. The distributional Shapley value has been discussed by ghorbani2020distributional ghorbani2020distributional who argued that keeping privacy is hard during Shapley value computation. Their method calculates the Shapley value over a distribution which solves problems such as lack of privacy. The computation time of this can be reduced as kwon2021efficient kwon2021efficient point out with approximation methods optimized for specific machine learning models.

4.3 Federated learning

A federated learning scenario can be seen as a cooperative game by modeling the data owners as players who cooperate to train a high-quality machine learning model [29].

Definition 14.

Federated learning game. In this game players are a set of labeled dataset owners where and are the feature and label sets owned by the silo. Let be a labeled test set, a coalition of data silos, a machine learning model trained on , and the labels predicted by on . The payoff of is where is a goodness of fit metric.

The system described by liu2021gtg liu2021gtg uses Monte Carlo sampling to approximate the Shapley value of data coming from the data silos in linear time. Given the potentially overlapping nature of the datasets, the use of configuration games could be an interesting future direction [1].

4.4 Explainable machine learning

In explainable machine learning the Shapley value is used to measure the contributions of input features to the output of a machine learning model at the instance level. Given a specific data point, the goal is to decompose the model prediction and assign Shapley values to individual features of the instance. There are universal solutions to this challenge that are model agnostic and designs customized for deep learning [5, 2], classification trees [31], and graphical models [28, 39].

4.4.1 Universal explainability

A cooperative game for universal explainability is completely model agnostic; the only requirement is that a scalar-valued output can be generated by the model such as the probability of a class label being assigned to an instance.

Definition 15.

Universal explainability game. Let us denote the machine learning model of interest with and let the player set be the feature values of a single data instance: . The payoff of a coalition in this game is the scalar valued prediction calculated from the subset of feature values.

Calculating the Shapley value in a game like this offers a complete decomposition of the prediction because the efficiency

axiom holds. The Shapley values of feature values are explanatory attributions to the input features and missing input feature values are imputed with a reference value such as the mean computed from multiple instances

[31, 8]. The pioneering Shapley value-based universal explanation method SHAP [31] proposes a linear time approximation of the Shapley values which we discussed in Section 3. This approximation has shortcomings and implicit assumptions about the features which are addressed by newer Shapley value-based explanation techniques. For example, in [14] the input features are not necessarily independent, [15] restricts the permutations based on known causal relationships, and in [8] the proposed technique improves the convergence guarantees of the approximation. Several methods generalize SHAP beyond feature values to give attributions to first-order feature interactions [42, 41]. However, this requires that the player set is redefined to include feature interaction values.

4.4.2 Deep learning

In neuron explainability games neurons are players and attributions to the neurons are payoffs. The primary goal of Shapley value-based explanations in deep learning is to solve these games and compute attributions to individual neurons and filters [18, 2].

Definition 16.

Neuron explainability game. Let us consider

the encoder layer of a neural network and

x the input feature vector to the encoder. In the neuron explainability game the player set is - each player corresponds to the output of a neuron in the final layer of the encoder. The payoff of coalition is defined as the predicted output where is the head layer of the neural network.

In practical terms, the payoffs are the output of the neural network obtained by masking out certain neurons. Using the Shapley values obtained in these games the value of individual neurons can be quantified. At the same time, some deep learning specific Shapley value-based explanation techniques have designs and goals that are aligned with the games described in universal explainability. These methods exploit the structure of the input data [5] or the nature of feature interactions [51] to provide efficient computations of attributions.

4.4.3 Graphical models

Compared to universal explanations the graphical model-specific techniques restrict the admissible set of player set permutations considered in the attribution process. These restrictions are defined based on known causal relations and permutations are generated by various search strategies on the graph describing the probabilistic model [21, 28, 39]. Methods are differentiated from each other by how restrictions are defined and how permutations are restricted.

4.4.4 Relational machine learning

In the relational machine learning domain the Shapley value is used to create edge importance attributions of instance-level explanations [11, 50]. Essentially the Shapley value in these games measures the average marginal change in the outcome variable as one adds a specific edge to the edge set in all of the possible edges set permutations. It is worth noting that the edge explanation and attribution techniques proposed could be generalized to provide node attributions.

Definition 17.

Relational explainability game. Let us define a graph where and are the vertex and edge sets. Given the relational machine learning model , node feature matrix X, node , the payoff of coalition in the graph machine learning explanation game is defined as the node level prediction .

4.5 Multi-agent reinforcement learning

Global reward multi-agent reinforcement learning problems can be modeled as TU games [44, 27] by defining the player set as the set of agents and the payoff of coalitions as a global reward. The Shapley value allows an axiomatic decomposition of the global reward achieved by the agents in these games and the fair attribution of credit assignments to each of the participating agents.

4.6 Model valuation in ensembles

The Shapley value can be used to assess the contributions of machine learning models to a composite model in ensemble games. In these games, players are models in an ensemble and payoffs are decided by whether prediction mad by the model are correct.

Definition 18.

Ensemble game. Let us consider a single target - feature instance denoted by . The player set in ensemble games is defined by a set of machine learning models

that operate on the feature set. The predicted target output by the ensemble

is defined as where is a prediction aggregation function. The payoff of is where is a goodness of fit metric.

The ensemble games described by [34] are formulated as a special subclass of voting games. This allows the use of precise game-specific approximation [13] techniques and because of this the Shapley value estimates are obtained in quadratic time and have a tight approximation error. The games themselves are model agnostic concerning the player set – ensembles can be formed by heterogeneous types of machine learning models that operate on the same inputs.

5 Discussion

The Shapley value has a wide-reaching impact in machine learning, but it has limitations and certain extensions of the Shapley value could have important applications in machine learning.

5.1 Limitations

5.1.1 Computation time

Computing the Shapley value for each player naively in a TU game takes factorial time. In some machine learning application areas such as multi-agent reinforcement learning and federated learning where the number of players is small, this is not an issue. However, in large scale data valuation [25, 26], explainability [31], and feature selection [33] settings the exact calculation of the Shapley value is not tractable. In Sections 3 and 4 we discussed approximation techniques proposed to make Shapley value computation possible. In some cases, asymptotic properties of these Shapley value approximation techniques are not well understood – see for example [5].

5.1.2 Interpretability

By definition, the Shapley values are the average marginal contributions of players to the payoff of the grand coalition computed from all permutations [36]. Theoretical interpretations like this one are not intuitive and not useful for non-game theory experts. This means that translating the meaning of Shapley values obtained in many application areas to actions is troublesome [24]. For example in a data valuation scenario: is a data point with a twice as large Shapley value as another one twice as much valuable? Answering a question like this requires a definition of the cooperative game that is interpretable.

5.1.3 Axioms do not hold under approximations

As we discussed most applications of the Shapley value in machine learning use approximations. The fact that under these approximations the desired axiomatic properties of the Shapley value do not hold is often overlooked [42]. This is problematic because most works argue for the use of Shapley value based on these axioms. In our view, this is the greatest unresolved issue in the applications of the Shapley value.

5.2 Future Research Directions

5.2.1 Hierarchy of the coalition structure

The Shapley value has a constrained version called Owen value [32] in which only permutations satisfying conditions defined by a coalition structure - a partition of the player set - are considered. The calculation of the Owen value is identical to that of the Shapley value, with the exception that only those permutations are taken into account where the players in any of the subsets of the coalition structure follow each other. In several real-world data and feature valuation scenarios even more complex hierarchies of the coalition, the structure could be useful. Having a nested hierarchy imposes restrictions on the admissible permutations of the players and changes player valuation. Games with such nested hierarchies are called level structure games in game theory. [48] presents the Winter value a solution concept to level structure games - such games are yet to receive attention in the machine learning literature.

5.2.2 Overlapping coalition structure

Traditionally, it is assumed that players in a coalition structure are allocated in disjoint partitions of the grand coalition. Allowing players to belong to overlapping coalitions in configuration games [1] could have several applications in machine learning. For example in a data-sharing - feature selection scenario multiple data owners might have access to the same features - a feature can belong to overlapping coalitions.

5.2.3 Solution concepts beyond the Shapley value

The Shapley value is a specific solution concept of cooperative game theory with intuitive axiomatic properties (Section 2). At the same time it has limitations with respect to computation constraints and interpretability (Sections 3 and 5). Cooperative game theory offers other solution concepts such as the core, nucleolus, stable set, and kernel with their own axiomatizations. For example, the core has been used for model explainability and feature selection [49]. Research into the potential applications of these solution concepts is lacking.

6 Conclusion

In this survey we discussed the Shapley value, examined its axiomatic characterizations and the most frequently used Shapley value approximation approaches. We defined and reviewed its uses in machine learning, highlighted issues with the Shapley value and potential new application and research areas in machine learning.

References

  • [1] J. Albizuri, J. Aurrecoechea, and J. M. Zarzuelo (2006) Configuration Values: Extensions of the Coalitional Owen Value. Games and Economic Behavior 57 (1), pp. 1–17. Cited by: §4.3, §5.2.2.
  • [2] M. Ancona, C. Oztireli, and M. Gross (2019) Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation. In International Conference on Machine Learning, pp. 272–281. Cited by: §4.4.2, §4.4, Table 2.
  • [3] J. F. Banzhaf III (1964) Weighted Voting Doesn’t Work: A Mathematical Analysis. Rutgers L. Rev. 19, pp. 317. Cited by: §4.1.
  • [4] G. Chalkiadakis, E. Elkind, and M. Wooldridge (2011) Computational Aspects of Cooperative Game Theory.

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    5 (6), pp. 1–168.
    Cited by: §1.
  • [5] J. Chen, L. Song, M. Wainwright, and M. Jordan (2018) L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data. In International Conference on Learning Representations, Cited by: §4.4.2, §4.4, Table 2, §5.1.1.
  • [6] S. Cohen, G. Dror, and E. Ruppin (2007) Feature Selection via Coalitional Game Theory. Neural Computation 19 (7), pp. 1939–1961. Cited by: §1, §4.1, Table 2.
  • [7] R. D. Cook (1977) Detection of influential observation in linear regression. Technometrics 19 (1), pp. 15–18. Cited by: §4.2.
  • [8] I. Covert and S. Lee (2021) Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. In International Conference on Artificial Intelligence and Statistics, pp. 3457–3465. Cited by: §4.4.1, Table 2.
  • [9] A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney (2009) Sampling algorithms and coresets for ell_p regression. SIAM Journal on Computing 38 (5), pp. 2060–2078. Cited by: §4.2.
  • [10] D. Deutch, N. Frost, A. Gilad, and O. Sheffer (2021) Explanations for Data Repair Through Shapley Values. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 362–371. Cited by: Table 2.
  • [11] A. Duval and F. D. Malliaros (2021) GraphSVX: shapley value explanations for graph neural networks. In Machine Learning and Knowledge Discovery in Databases., pp. 302–318. Cited by: §4.4.4, Table 2.
  • [12] Z. Fan, H. Fang, Z. Zhou, J. Pei, et al. (2021) Improving Fairness for Data Valuation in Federated Learning. arXiv preprint arXiv:2109.09046. Cited by: §1.
  • [13] S. S. Fatima, M. Wooldridge, and N. R. Jennings (2008) A Linear Approximation Method for the Shapley Value. Artificial Intelligence 172 (14), pp. 1673–1699. Cited by: §4.6.
  • [14] C. Frye, D. de Mijolla, T. Begley, et al. (2020) Shapley Explainability on the Data Manifold. In International Conference on Learning Representations, Cited by: §4.4.1, Table 2.
  • [15] C. Frye, C. Rowat, and I. Feige (2020) Asymmetric Shapley Values: Incorporating Causal Knowledge Into Model-Agnostic Explainability. Advances in Neural Information Processing Systems 33. Cited by: §4.4.1, Table 2.
  • [16] D. Fryer, I. Strümke, and H. Nguyen (2021) Shapley Values for Feature Selection: the Good, the Bad, and the Axioms. arXiv preprint arXiv:2102.10936. Cited by: §4.1.
  • [17] A. Ghorbani and J. Zou (2019) Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning, pp. 2242–2251. Cited by: §1, §4.2, Table 2.
  • [18] A. Ghorbani and J. Zou (2020) Neuron Shapley: Discovering the Responsible Neurons. In Advances in Neural Information Processing Systems, pp. 5922–5932. Cited by: §4.4.2, Table 2.
  • [19] R. Guha, A. H. Khan, P. K. Singh, R. Sarkar, and D. Bhattacharjee (2021) CGA: a new feature selection model for visual human action recognition. Neural Computing and Applications 33 (10), pp. 5267–5286. Cited by: §4.1, Table 2.
  • [20] I. Guyon and A. Elisseeff (2003) An Introduction to Variable and Feature Selection. Journal of machine learning research 3 (Mar), pp. 1157–1182. Cited by: §4.1.
  • [21] T. Heskes, E. Sijben, I. G. Bucur, and T. Claassen (2020) Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models. Advances in Neural Information Processing Systems 33. Cited by: §4.4.3, Table 2.
  • [22] R. Jia, D. Dao, B. Wang, Hubis, et al. (2019) Towards Efficient Data Valuation Based on the Shapley Value. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1167–1176. Cited by: §4.2, Table 2.
  • [23] P. W. Koh and P. Liang (2017) Understanding black-box predictions via influence functions. In International Conference on Machine Learning, pp. 1885–1894. Cited by: §4.2.
  • [24] E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler (2020) Problems with Shapley-Value-Based Explanations as Feature Importance Measures. In International Conference on Machine Learning, pp. 5491–5500. Cited by: §5.1.2.
  • [25] Y. Kwon, M. A. Rivas, and J. Zou (2021) Efficient Computation and Analysis of Distributional Shapley Values. In International Conference on Artificial Intelligence and Statistics, pp. 793–801. Cited by: Table 2, §5.1.1.
  • [26] Y. Kwon and J. Zou (2021) Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning. arXiv preprint arXiv:2110.14049. Cited by: §4.2, Table 2, §5.1.1.
  • [27] J. Li, K. Kuang, B. Wang, et al. (2021) Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 934–942. Cited by: §4.5, Table 2.
  • [28] Y. Liu, C. Chen, Y. Liu, X. Zhang, and S. Xie (2020) Shapley Values and Meta-Explanations for Probabilistic Graphical Model Inference. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 945–954. Cited by: §4.4.3, §4.4, Table 2.
  • [29] Z. Liu, Y. Chen, H. Yu, Y. Liu, and L. Cui (2021) GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation in Federated Learning. arXiv preprint arXiv:2109.02053. Cited by: §4.3, Table 2.
  • [30] M. Lomeli, M. Rowland, A. Gretton, and Z. Ghahramani (2019) Antithetic and Monte Carlo Kernel Estimators for Partial Rankings. Statistics and Computing 29 (5), pp. 1127–1147. Cited by: §3.1.2.
  • [31] S. M. Lundberg and S. Lee (2017) A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777. Cited by: §1, §4.4.1, §4.4, Table 2, §5.1.1.
  • [32] G. Owen (1977) Values of Games with a Priori Unions. In Mathematical Economics and Game Theory, pp. 76–88. Cited by: §5.2.1.
  • [33] R. Patel, M. Garnelo, I. Gemp, C. Dyer, and Y. Bachrach (2021) Game-Theoretic Vocabulary Selection via the Shapley Value and Banzhaf Index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 2789–2798. Cited by: §4.1, Table 2, §5.1.1.
  • [34] B. Rozemberczki and R. Sarkar (2021) The Shapley Value of Classifiers in Ensemble Games. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, pp. 1558–1567. Cited by: §1, §1, §4.6, Table 2.
  • [35] R. Y. Rubinstein and D. P. Kroese (2016) Simulation and the Monte Carlo Method. Vol. 10. Cited by: §3.1.2.
  • [36] L. Shapley (1953) A Value for N-Person Games. Contributions to the Theory of Games, pp. 307–317. Cited by: §1, §5.1.2, Definition 7.
  • [37] B. Sharchilev, Y. Ustinovskiy, P. Serdyukov, and M. Rijke (2018)

    Finding influential training samples for gradient boosted decision trees

    .
    In International Conference on Machine Learning, pp. 4577–4585. Cited by: §4.2.
  • [38] D. Shim, Z. Mai, J. Jeong, S. Sanner, et al. (2021) Online Class-Incremental Continual Learning with Adversarial Shapley Value. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 9630–9638. Cited by: Table 2.
  • [39] R. Singal, G. Michailidis, and H. Ng (2021) Flow-based Attribution in Graphical Models: A Recursive Shapley Approach. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139, pp. 9733–9743. Cited by: §4.4.3, §4.4, Table 2.
  • [40] X. Sun, Y. Liu, J. Li, J. Zhu, et al. (2012) Feature Evaluation and Selection with Cooperative Game Theory. Pattern recognition 45 (8), pp. 2992–3002. Cited by: §4.1, Table 2.
  • [41] M. Sundararajan, K. Dhamdhere, and A. Agarwal (2020) The Shapley Taylor Interaction Index. In International Conference on Machine Learning, pp. 9259–9268. Cited by: §4.4.1, Table 2.
  • [42] M. Sundararajan and A. Najmi (2020) The Many Shapley Values for Model Explanation. In International Conference on Machine Learning, pp. 9269–9278. Cited by: §4.4.1, Table 2, §5.1.3.
  • [43] S. Tripathi, N. Hemachandra, and P. Trivedi (2020) Interpretable Feature Subset Selection: A Shapley Value Based Approach. In IEEE International Conference on Big Data, pp. 5463–5472. Cited by: §4.1, Table 2.
  • [44] J. Wang, J. Wang, Y. Zhang, Y. Gu, and T. Kim (2021) SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2105.15013. Cited by: §4.5, Table 2.
  • [45] J. Wang, J. Wiens, and S. Lundberg (2021) Shapley Flow: A Graph-Based Approach to Interpreting Model Predictions. In International Conference on Artificial Intelligence and Statistics, pp. 721–729. Cited by: Table 2.
  • [46] T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song (2020) A Principled Approach to Data Valuation for Federated Learning. In Federated Learning, pp. 153–167. Cited by: §1.
  • [47] B. Williamson and J. Feng (2020) Efficient Nonparametric Statistical Inference on Population Feature Importance Using Shapley Values. In International Conference on Machine Learning, pp. 10282–10291. Cited by: §4.1, Table 2.
  • [48] E. Winter (1989) A Value for Cooperative Games with Levels Structure of Cooperation. International Journal of Game Theory 18 (2), pp. 227–40. Cited by: §5.2.1.
  • [49] T. Yan and A. D. Procaccia (2021-05) If You Like Shapley Then You Will Love the Core. Proceedings of the AAAI Conference on Artificial Intelligence 35 (6), pp. 5751–5759. Cited by: §5.2.3.
  • [50] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji (2021) On Explainability of Graph Neural Networks via Subgraph Explorations. In Proceedings of the 38th International Conference on Machine Learning, pp. 12241–12252. Cited by: §4.4.4, Table 2.
  • [51] H. Zhang, Y. Xie, L. Zheng, D. Zhang, and Q. Zhang (2021) Interpreting Multivariate Shapley Interactions in DNNs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 10877–10886. Cited by: §4.4.2, Table 2.