Log In Sign Up

Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members

by   Daphne Cornelisse, et al.

In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, that fairly reflect the contribution of individuals to the performance of the team or the Core, which reduces the incentive of agents to abandon their team. Applications of such methods include identifying influential features and sharing the costs of joint ventures or team formation. Unfortunately, using these solutions requires tackling a computational barrier as they are hard to compute, even in restricted settings. In this work, we show how cooperative game-theoretic solutions can be distilled into a learned model by training neural networks to propose fair and stable payoff allocations. We show that our approach creates models that can generalize to games far from the training distribution and can predict solutions for more players than observed during training. An important application of our framework is Explainable AI: our approach can be used to speed-up Shapley value computations on many instances.


page 1

page 2

page 3

page 4


Negotiating Team Formation Using Deep Reinforcement Learning

When autonomous agents interact in the same environment, they must often...

Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions

Can we predict how well a team of individuals will perform together? How...

Value-based CTDE Methods in Symmetric Two-team Markov Game: from Cooperation to Team Competition

In this paper, we identify the best learning scenario to train a team of...

Fair Multi-party Machine Learning – a Game Theoretic approach

High performance machine learning models have become highly dependent on...

Cooperative Multi-Agent Fairness and Equivariant Policies

We study fairness through the lens of cooperative multi-agent learning. ...

Cooperative Optimization for Energy Minimization: A Case Study of Stereo Matching

Often times, individuals working together as a team can solve hard probl...

Dynamics and Allocation of Transaction Cost in Multiagent Industrial Symbiosis

This paper discusses the dynamics of Transaction Cost (TC) in Industrial...

1 Introduction

The ability of individuals to form teams and collaborate is crucial to their performance in many environments. The success of humans as a species hinges on our capability to cooperate at scale Henrich (2015). Similarly, cooperation between learning agents is necessary to achieve high performance in many environments Lowe et al. (2017); Baker et al. (2019); Vinyals et al. (2019); Jaderberg et al. (2019) and is a fundamental problem in artificial intelligence Stone and Veloso (2000); Dafoe et al. (2021). Individual agents are often not incentivized by the joint reward achieved by a team but rather by their share of the spoils. Hence, teams are only likely to be formed when the overall gains obtained by the team are appropriately distributed between its members. However, understanding how collective outcomes arise from subsets of locally interacting parts, or measuring the impact of individuals on the team’s performance, remain open problems.

Direct applications exist in multiple domains. One example is identifying the most influential features that drive a model to make a certain prediction Datta et al. (2016); Lundberg and Lee (2017); Bachrach et al. (2012); Sundararajan et al. (2017); Lorraine et al. (2021b, a); one of the cornerstones of explainable AI Arrieta et al. (2020); Metz et al. (2021). Another example is sharing the costs of data acquisition or a joint venture in a fair way between participants Balkanski et al. (2017); Agarwal et al. (2019), or sharing gains between cooperating agents Gately (1974); Sim et al. (2020). In many legislative bodies individual participants have different weights, and passing a decision requires support from a set of participants holding the majority of the weight; different states in the US electoral college have different numbers of electors, and different countries in the EU Council of Ministers vary in their voting weight. Here, would like to quantify the true political power held by each participant, or allocate a common budget between them Mann and Shapley (1962); Bilbao et al. (2002).

Cooperative game theory can provide strong theoretical foundations underpinning such applications. The field provides solution concepts that measure the relative impact of individuals on team performance, or the individual rewards agents are entitled to. Power indices such as the Shapley value Shapley (1953) or Banzhaf index Banzhaf III (1964) attempt to divide the joint reward in a way that is fair, and have recently been used to compute feature importance Lundberg and Lee (2017). In contrast, other solutions such as the Core Gillies (1953) attempt to offer a stable allocation of payoffs, where individual agents are incentivised to continue working with their team, rather than attempting to break away from their team in favor of working with other agents. Despite their theoretical appeal, these solution concepts are difficult to apply in practice due to computational constraints. Computing them is typically a hard problem, even in restricted environments Elkind et al. (2007); Chalkiadakis et al. (2011); Deng and Papadimitriou (1994).

Our contribution:

We construct models that predict fair or stable payoff allocations among team members, combining solution concepts from cooperative game theory with the predictive power of neural networks. These neural “payoff machines” take in a representation of the performance or reward achievable by different subsets of agents, and output a suggested payoff vector allocating the total reward between the agents. By training the neural networks based on different cooperative solution concepts, the model can be tuned to aim for a fair distribution of the payoffs (the Shapley value or Banzhaf index) or to minimize the incentives of agents to abandon the team (the Least-Core 

Gillies (1953); Maschler et al. (1979); Deng and Papadimitriou (1994)). Figure 1 depicts the two well-studied classes of games on which we evaluate our approach: weighted voting games Mann and Shapley (1962); Bilbao et al. (2002); Chalkiadakis et al. (2011) and feature importance games in explainable AI Datta et al. (2016); Lundberg and Lee (2017).

Weighted voting games (WVGs) are arguably the most well-studied class of cooperative games. Each agent is endowed with a weight and a team achieves its goal if the sum of the weights of the team members exceeds a certain threshold (quota). We train the neural networks by generating large sets of such games and computing their respective game theoretic solutions. Our empirical evaluation shows that the predictions for the various solutions (the Shapley value, Banzhaf index and Least-Core) accurately reflect the true game theoretic solutions on previously unobserved games. Furthermore, the resulting model can generalize even to games that are very far from the training distribution or with more players than the games in the training set.

Feature importance games

are a model for quantifying the relative influence of features on the outcome of a machine learning model 

Datta et al. (2016); Lundberg and Lee (2017). Solving these games for the Shapley value (or other game theoretic measure) provides a way to reverse-engineer the key factors that drove a model to reach a specific decision. This approach is model-agnostic, thus can be applied to make any “black-box” model more interpretable Arrieta et al. (2020). One drawback of this approach is the computational complexity of calculating the Shapley value, making such analysis slow even when using approximation algorithms. Our approach provides a way to significantly speedup Explainable AI analyses, particularly for datasets with a large number of instances.

Figure 1: Evaluation domains for our approach. Left: Weighted voting games (WVGs) model decision making bodies such as the US Electoral College  Mann and Shapley (1962); Bilbao et al. (2002). Our approach predicts solution concepts which reflect the true political influence of players or stable payoff allocations. Right: Applying the Shapley value in Feature Importance Games enables quantifying the relative impact of features on the decisions of a model. In this example, a model predicts the price of a house based on several features. Our approach predicts the Shapley values of features, which are a commonly used metric for their impact. This speeds-up Explainable AI analysis, as each new instance can be analyzed quickly without a full computation of Shapley values.

2 Preliminaries

We provide a brief overview of cooperative game theory (examined in details in various books Osborne and Rubinstein (1994); Chalkiadakis et al. (2011)) and discuss how solution concepts in cooperative game theory have been applied in Explainable AI Datta et al. (2016); Lundberg and Lee (2017); Lundberg et al. ; Das and Rad (2020); Yan and Procaccia (2021).

2.1 Cooperative Game Theory

A (transferable utility) cooperative game consists of a set

of agents, or players, and a characteristic function

which maps each team of players, or coalition , to a real number. This number indicates the joint reward the players obtained as a team. Games where (binary range) are simple games.

Weighted voting games (WVGs) are a restricted class of simple cooperative games Chalkiadakis et al. (2011), where each agent has a weight and a team of agents wins if the sum of the weights of its participants exceeds a quota . Formally, a WVG is defined as the triple with weights and quota (threshold) where for any we have if and otherwise . If we say is a losing coalition, and if we say it is a winning coalitions. WVGs have been thoroughly investigated as a model of voting bodies, such the US Electoral College or the EU Council of Ministers Mann and Shapley (1962); Bilbao et al. (2002).

The characteristic function defines the joint value of a coalition, but it does not specify how the value should be distributed among the agents. Solution concepts attempt to determine an allocation of the utility achieved by the grand coalition of all the agents; an allocation is called an imputation if for any player we have and . 111In a WVG, the value of a coalition is bounded by 1, so if the grand coalition indeed has a value of then a solution would be a payoff vector where . This allocation is meant to achieve some desiderata, such as fairly reflecting the contributions of individual agents, or achieving stability in the sense that no subset of agents is incentivized to abandon the team and form a new team. We describe three prominent solution concepts that we use in our analysis.

The Core.

Rational players may abandon the grand coalition of all the agents if they can increase their individual utility by doing so. The Core is defined as the set of all payoff vectors where no subset of agents can generate more utility, as measured by the characteristic function, than the total payoff they are currently awarded by the payoff vector. As such, the Core is viewed as the set of stable payoff allocations. Formally, the core Gillies (1953) is defined as the set of all imputations such that and that for any coalition .

The -Core and Least-Core

Maschler et al. (1979); Deng and Papadimitriou (1994). Some games have empty cores, meaning that no payoff allocations achieves full stability (i.e. for any imputation there exists at least one coalition such that ). In such cases, researchers have proposed minimizing the instability. A relaxation of the core is the -core, consisting of imputations where for any value coalition we have . Given an imputation the difference is called the excess of the coalition, and represents the total improvement in utility the members of can achieve by abandoning the grand coalition and working on their own. For an imputation in the -core, no agent subset can achieve an addition of in utility over the current total payoff offered to the team (i.e. no coalition has an excess of more than ). The minimal for which the -core is non-empty is called the Least-Core Value

(LCV). The Least-Core minimizes the incentive of any agent subset to abandon the grand coalition, and the LCV thus represents the degree of instability (excess) under the imputation that best minimizes this instability. We find the set of payoffs associated with the LVC through linear programming (full details in Appendix 


We now discuss two power indices, payoff distributions reflecting the true influence a player has on the performance of the team, that fairly allocate the total gains of the teams among the agents in it.

The Shapley Value

Shapley (1953) measures the average marginal contribution of each player across all permutations of the players. The Shapley value is the unique solution concept that fulfills several natural fairness axioms Dubey (1975)

, and has thus found many applications from estimating feature importance 

Datta et al. (2016); Lundberg and Lee (2017) to pruning neural networks Wang (2021); Frankle and Carbin (2018); Ghorbani and Zou (2020). Formally, we denote a permutation of the players by , where is a bijective mapping of to itself, and the set of all such permutations by . By we denote all players appearing before in the permutation . The Shapley value of player is defined as:


Intuitively, one can consider starting with an empty coalition and adding the players’ weights in the order of the permutation; the first player whose addition meets or exceeds the quota is considered the pivotal player in the permutation. The Shapley value then measures the proportion of permutations where a player is the pivotal player.

The Banzhaf index Banzhaf III (1964) is another method for distributing payoffs according to a players’ ability to change the outcome of the game, but it reflects slightly different fairness axioms Straffin jr (1988). The Banzhaf index of a player is defined as the marginal contribution of a player across all subsets not containing that player:


In practice, we first compute the set of winning coalitions and count, for each player, the number of times it is critical or pivotal, that is, .

2.2 Speeding Up Explainable AI In Large Datasets

An important goal of our work is to speedup Shapley/Core computations of many data instances. In machine learning, we train a model to learn a mapping from some set of input features to an outcome. This could we can train a model to predict the price of a house based on a number of features, such as the number of rooms, the year it was built, and so on (Figure 1). In such settings, it is desirable, but challenging, to explain the model outputs in terms of the input features. Explainable AI addresses this issue, and recent years showed several applications of game theoretic metrics for measuring feature importance in the machine learning community Sundararajan et al. (2017); Arrieta et al. (2020).

The fastest method to approximate Shapley values (also used in the SHAP package) is a Monte-Carlo approach Lundberg and Lee (2017). A number of other methods exist whose runtime and accuracy depend on the number of samples used, usually on the order of several thousands Leech (2002a); Datta et al. (2016); Maleki et al. (2013); Bachrach et al. (2010). In a model setting, the characteristic function takes the value of the trained model output for a given instance: . The Shapley value of a feature in a data instance , is the effect that feature has on the model outcome. Sampling based-methods compute the contributions with respect to a base value, which is the average model output across all instances: . Sampling based methods are not ideal for large datasets because they require a large number of re-evaluation samples per computation. We show how our approach can be employed to speedup Shapley or Core computations of many instances by training models to learn representations of feature attribution schemes.

3 Methods

Our approach uses machine learning to create game-theoretic estimators. We generate synthetic datasets of games to train our models, spending compute up-front to allow for instant solutions afterwards. Our first domain concerns weighted voting games (WVGs) as they provide a generic framework for studying cooperation in multi-agent settings.

Afterwards, we apply our approach to the Shapley-based feature importance setting. The main idea is that we can speed-up the computations of the relative impact of features on the predictions of a machine learning model. We do this by training models to approximate Shapley values of features, and examine their performance on previously unobserved instances.

3.1 Weighted Voting Games

We consider two types of models: fixed-size models that always predict a fixed number of outputs for games with a fixed number of players, and variable-size

ones that predict a variable number of outcomes and can handle variable numbers of players. We now describe our data generation process, models, training procedure, and evaluation metrics used in weighted voting games.

3.1.1 Data and Models: Fixed-Size

For each player game, we generate independent and identically distributed games. The training dataset is given by where denotes the number of outputs of interest. The features are obtained in two steps. First, we sample a weight vector on the interval from 1 to , and a quota . To create games where players are dependent on each other to achieve the task at hand, the quota distribution is parameterized such that the average drawn quota is half of the sum of the players’ weights. We then normalize the weights by the quota to get the feature vector . In other words, we have features which are the weights normalized as the respective proportion of the quota. Thus, a value indicates that player is a winning coalition by itself, and needs other players to meet the quota otherwise (Figure E.2).

We train models to predict the three solutions: the Least-Core, the Shapley values, and the Banzhaf indices. For the Shapey values and the Banzhaf indices, the model predicts the payoff allocation so . For the Least-Core solution, we train the model to not only predict the payoff allocation , but also to predict the Least Core Value (so in this case the model has outputs). For each player game, we produce a model , where are the model parameters. We use deep feedforward networks for . Appendix D.1 contains the full details about the experimental setup.

3.1.2 Data and Models: Variable-size

For the variable-size case, we consider a maximal number of possible players

and pad the inputs with zeros for games with less than

players. Hence, we generate a single dataset . Similarly to the fixed-size dataset, the feature matrix contains the normalized weights and the corresponding solutions with either or , with the ground truth output vector again padded with zeros when there are fewer than players. Hence, we allow for the prediction of up to players, and we shuffle the data so that players are located at random positions. The Least Core Value (LCV) is not shuffled but stored at the last element of each row (Figure E.1).

We produce a single model that can be used for different number of players , where are the model parameters. The model learns to allocate the joint payoff among at most players. During prediction time we pad the input with zeros when there are fewer than players, and we redistribute the payoffs allocated to non-player entries among the players according to their original share of the joint payoff.

3.1.3 Training and Evaluation

During training we minimize the Mean Square Error (MSE) between the true and predicted solutions. For the variable-size models we also include the padded locations so that the model learns not to allocate value to non-player entries.

Evaluation metrics.

Given a predicted payoff vector , we consider multiple metrics for assessing the models’ performance. First, we quantify the models’ predictive performance via the Mean Absolute Error (MAE), defined for each game as , where is the number of players. For the Least-Core there is another natural game theoretic metric. The goal of the Least-Core is to minimize the incentives of any subset to abandon the current team and form its own sub-team. Given a suggested payoff vector , the maximal excess over all possible coalitions measures the incentive to abandon the team, and serves as a good measure for the quality of the model.

Test data.

We sample weights with varying parameters for and to assess our models’ ability to generalize to previously unobserved instances their ability generalize to games far outside of the training distribution (full details in Table 2, Figure D.1).

3.2 Feature Importance Games

We perform an experiment to show that neural networks can provide a faster alternative for measuring feature importance at scale. We select a dataset (details in Appendix D), train a model, and use those to construct a dataset from features to Shapley values using the SHAP KernelExplainer Lundberg and Lee (2017)

Following, we partition our dataset into a train and test, and incrementally change the proportions between the two. For each increment, we train a model for 100 epochs and test it on the remainder of the unseen instances.

4 Results

We present experimental results that allow us to assess how well neural models are able to learn a representation of the various solution concepts. We first describe the predictive performance of neural networks in the WVG setting, and consider properties of these solutions that make them hard to learn. We then consider the explainable AI domain, and study the performance and sample complexity of Shapley feature importance prediction.

4.1 Weighted Voting Games: Evaluation

For our WVG analysis we train a selection of fixed-size neural networks for each number of players 222A direct computation of the Shapley value requires enumerating through a large list of permutations, which becomes computationally very costly when there are many players. Hence, for games with players or more, we use Monte-Carlo approximations for the Shapley value to obtain the ground-truth solution. See full details in Appendix D.1) on games each. We also train a single variable-size model that is trained on one dataset containing games in total, consisting of games for each number of players games. The data is padded with zeros to allow for payoff allocation up to players (see details in Section 3.1).

4.1.1 Predictive performance

Table 1 shows that the ability to handle games of variable numbers of players comes at the cost of having a lower accuracy. However, even for variable-size models, and even under a significant distribution shift, the errors in predicting all solution concepts are low. We further note that the error in predicting Least-Core based payoffs are generally larger than for the Shapley and Banzhaf power indices. One possible reason is that for these solutions, there are many cases where a small perturbation in the weights or quota results in a large perturbation of the Least-Core solution Elkind and Pasechnik (2009); Zuckerman et al. (2012).

Least core payoffs Least core excess Shapley values Banzhaf indices
Dataset Mean MAE MAE Mean MAE Mean MAE
Fixed Variable Fixed Variable Fixed Variable Fixed Variable
In-sample 0.030 0.043 0.015 0.034 0.019 0.022 0.018 0.028
Out-of-sample 0.030 0.044 0.015 0.027 0.018 0.036 0.018 0.056
Slightly out-of-distribution 0.029 0.028 0.015 0.050 0.018 0.019 0.017 0.018
Moderately out-of-distribution 0.030 0.035 0.014 0.029 0.018 0.026 0.018 0.032
Significantly out-of-distribution 0.031 0.045 0.014 0.036 0.018 0.029 0.018 0.039
Table 1: Comparison of predictive performance across test datasets and solution concepts.

We now highlight several conclusions from our analysis of the results.

Fixed-size models display stable performance across solution concepts.

Our first observation is that the fixed-size neural networks are adept at estimating solutions across the three considered concepts. The average error per player ranges from to (Least Core), to (Shapley values) and to (Banzhaf indices). Table 5 provides a complete summary of our models’ in-sample performance.

Fixed-size models are robust to shifts in the weight distribution.

As shown in Figure 2, the predictive performance (Mean MAE) of the fixed-size models is consistent across test datasets. To account for the natural decrease in the MAE as increases, we also display the average payoff per player (). As expected, the error scales approximately with the average payoff per player.

(a) Least Core
(b) Shapley value
(c) Banzhaf index
Figure 2: Performance fixed-size models. We evaluate the performance on five test sets with samples per

player each. Figures show the MAE and 95 % confidence intervals for each solution concept. The black dashed line indicates the average payoff per player as a function of

Variable-size models are robust to shifts in the weight distribution.

Figure 3 shows that the variable-size models are also able to generalize outside the training distribution. Across test sets, we observe a stable performance that decays with the number of players, as is expected. For the Least Core, we see that there is an abrupt decrease in performance for the excess beyond 8 players.

(a) Least Core
(b) Shapley value
(c) Banzhaf index
Figure 3: Performance variable-size models. We evaluate the performance on five test sets with 1000 samples per -player game. Figures show the MAE and 95 % confidence intervals for each solution concept. The black dashed line indicates the average payoff per player as a function of .
Variable-size models generalize to a larger number of players.

Our main objective is to investigate how to leverage machine learning to perform scalable value estimation in large multi-agent settings. To test this, we train the variable-size models on games up players, and evaluate on players (full details in section 3.1.2). Figure 3 demonstrates that there is no significant decrease in performance for games with more than ten players across solution concepts: variable-size models are able to extrapolate beyond the number of players seen during training. This suggests that there is be valuable information in the small player games such that games of larger sizes can be inferred.

Fixed-size models outperform variable-size models.

Table 1 contains the Mean MAEs across solution concepts and test sets. Corresponding error distributions are showed in Figure 4

. From these, we conclude that in almost all settings the fixed-size networks outperform the variable-size networks. Variable-sized networks tend to have larger errors for all predicted variables and display more variance in their predictions.

(a) Least Core
(b) Shapley value
(c) Banzhaf index
Figure 4: Comparing overall performance fixed-size vs. variable-size networks.

4.1.2 Discontinuities in the Solution Space

The function mapping from game parameters to solutions contains discontinuities. Discontinuous jumps emerge from the players’ interdependence and the effect of the quota and are difficult for a model to learn. We analyze two examples that demonstrate how our models respond to such transitions.

Solution concepts are step-wise constant functions which are difficult to capture.

Consider taking a WVG and changing the weight of a player. This only changes the game theoretic solutions when it restructures the subset of winning coalitions. Thus, the function outputs the same value until the weight reaches a certain threshold where the structures of winning coalitions change, at which point the solution can change drastically. Learning these kind of functions is difficult, as the error around the discontinuity point is often large.

Our analysis examines an player game with a fixed weight vector and considers an array of quotas , where . We solve the game for each combination of the fixed weights and the changing quota in to obtain a matrix . Figure 5 shows the fixed-size model predictions for two selected WVGs. The models capture the overall effect of the changes to the quota, but do poorly close to the discontinuity point (where the ground truth solution incurs a large change).

Figure 5: Capturing step-wise jumps. Individual actual payoffs (green dots) together with the model predictions (blue triangles) as the value of quota increases by increments (full description in Appendix F).

4.2 Speeding Up Feature Importance Calculations By Predicting Shapley Values

We examine the sample efficiency and performance of neural networks trained to predict Shapley values on sets of features. We consider two well-known datasets: the classical UCI Bank Marketing dataset Moro et al. (2014) (17 features, 11,162 observations) and the Melbourne Housing dataset Pino (2018) (21 feature, 34,857 observations). Full details in Appendix D.2). Overall, we find that neural networks achieve high performance on the test set with very few training samples. Figure 6 displays the models’ performance (in RMSE) as a function of samples available for training. We observe that the error decays approximately exponentially with the proportion of data used for training on both the Banking and Melbourne datasets. A model trained on just 1 percent of the Melbourne dataset has an RMSE of 19.72 on the resultant test set. Increasing the number of training samples to 3 percent results in a RMSE of less than 0.07, a 99.6 percent decrease in error. We observe a similar trend for the Banking dataset: the RMSE decreases with 93.8 percent (from 4.10 to 0.25) when the available training data increases from 0.5 to 4 percent. These results highlight the strength of this approach: the computational cost of training the Shapley prediction network is very small as compared to the speedup obtained on the vast majority of the data (and any subsequent instances analyzed later). See Appendix I for a more extensive analysis.

Figure 6: Root Mean Squared Error as a function of training fraction. The RMSE between the actual and predicted Shapley values scales approximately as a power law with the training fraction. Left: UCI Banking dataset Right: Melbourne Housing dataset.

5 Discussion and Conclusion

We considered “neural payoff machines”, models that take in a representation of a cooperative game and solve it to produce a distribution of the joint gains of a team amongst its members. Our analysis focused on two concise representations of a characteristic function: Weighted voting games, and feature importance games for explainable AI. Our analysis shows that neural models capture cooperative game solutions well, and can generalize well outside the training distribution, even extrapolating to more players than in the training set.

We saw that the Least core is a harder solution concept to learn than the Shapley value and Banzhaf index. Potentially, the fact that the Least core is complex and can only be computed by solving multiple Linear Programs makes a hard concept to grasp from examples.

The methods we discussed can drive better analysis of decision making bodies and for approximating feature importance, but such methods can also be applied to other classes of cooperative games, so long as one can generate data consisting of game samples and their solutions. For instance, the same approach can be used for other cooperative representations, such as graph-based games Deng and Papadimitriou (1994); Aziz et al. (2009); Bachrach et al. (2008) or set-based representations Ohta et al. (2008). The direct applications that we have considered in this paper are also intrinsically valuable on their own, as we allow speeding up explainable AI analysis and political influence estimation. Some questions remain open for future work. Can similar techniques be used for non-cooperative games or for other types of cooperative games? Can this approach be applied to other solutions such as the Kernel or Nucleolous Chalkiadakis et al. (2011)? Finally, are there better neural network designs, such as graph neural networks or transformers, that can exploit invariances or equivariances in the game’s representation or be better equipped to deal with sharp discontinuities in the solutions?


  • A. Agarwal, M. Dahleh, and T. Sarkar (2019) A marketplace for data: an algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pp. 701–726. Cited by: §1.
  • A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins, et al. (2020) Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion 58, pp. 82–115. Cited by: §1, §1, §2.2.
  • H. Aziz, O. Lachish, M. Paterson, and R. Savani (2009) Power indices in spanning connectivity games. In International Conference on Algorithmic Applications in Management, pp. 55–67. Cited by: §5.
  • Y. Bachrach, T. Graepel, G. Kasneci, M. Kosinski, and J. Van Gael (2012) Crowd iq: aggregating opinions to boost performance. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 535–542. Cited by: §1.
  • Y. Bachrach, E. Markakis, E. Resnick, A. D. Procaccia, J. S. Rosenschein, and A. Saberi (2010) Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems 20 (2), pp. 105–122. Cited by: §2.2.
  • Y. Bachrach, J. S. Rosenschein, and E. Porat (2008) Power and stability in connectivity games. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pp. 999–1006. Cited by: §5.
  • B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, and I. Mordatch (2019) Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528. Cited by: §1.
  • E. Balkanski, U. Syed, and S. Vassilvitskii (2017) Statistical cost sharing. Advances in Neural Information Processing Systems 30. Cited by: §1.
  • J. F. Banzhaf III (1964) Weighted voting doesn’t work: a mathematical analysis. Rutgers L. Rev. 19, pp. 317. Cited by: §1, §2.1.
  • J. M. Bilbao, J. R. Fernandez, N. Jiménez, and J. J. Lopez (2002) Voting power in the european union enlargement. European Journal of Operational Research 143 (1), pp. 181–196. Cited by: Figure 1, §1, §1, §2.1.
  • F. Bistaffa, A. Farinelli, G. Chalkiadakis, and S. D. Ramchurn (2017) A cooperative game-theoretic approach to the social ridesharing problem. Artificial Intelligence 246, pp. 86–117. Cited by: Appendix M.
  • G. Chalkiadakis, E. Elkind, and M. Wooldridge (2011) Computational aspects of cooperative game theory. Synthesis Lectures on Artificial Intelligence and Machine Learning 5 (6), pp. 1–168. Cited by: §1, §1, §2.1, §2, §5.
  • A. Dafoe, Y. Bachrach, G. Hadfield, E. Horvitz, K. Larson, and T. Graepel (2021) Cooperative ai: machines must learn to find common ground. Nature Publishing Group. Cited by: §1.
  • A. Das and P. Rad (2020) Opportunities and challenges in explainable artificial intelligence (xai): a survey. arXiv preprint arXiv:2006.11371. Cited by: §2.
  • A. Datta, S. Sen, and Y. Zick (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP), pp. 598–617. Cited by: §1, §1, §1, §2.1, §2.2, §2.
  • J. Deegan and E. W. Packel (1978) A new index of power for simplen-person games. International Journal of Game Theory 7 (2), pp. 113–123. Cited by: Appendix C.
  • X. Deng and C. H. Papadimitriou (1994) On the complexity of cooperative solution concepts. Mathematics of operations research 19 (2), pp. 257–266. Cited by: §1, §1, §2.1, §5.
  • A. Dinar, A. Ratner, and D. Yaron (1992) Evaluating cooperative game theory in water resources. Theory and decision 32 (1), pp. 1–20. Cited by: Appendix M.
  • P. Dubey (1975) On the uniqueness of the shapley value. International Journal of Game Theory 4 (3), pp. 131–139. Cited by: §2.1.
  • E. Elkind, L. A. Goldberg, P. W. Goldberg, and M. Wooldridge (2007) Computational complexity of weighted threshold games. In AAAI, pp. 718–723. Cited by: §1.
  • E. Elkind and D. Pasechnik (2009) Computing the nucleolus of weighted voting games. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 327–335. Cited by: §4.1.1.
  • M. G. Fiestras-Janeiro, I. García-Jurado, A. Meca, and M. A. Mosquera (2011) Cooperative game theory and inventory management. European Journal of Operational Research 210 (3), pp. 459–466. Cited by: Appendix M.
  • J. Frankle and M. Carbin (2018) The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. Cited by: §2.1.
  • D. Gately (1974) Sharing the gains from regional cooperation: a game theoretic application to planning investment in electric power. International Economic Review, pp. 195–208. Cited by: §1.
  • A. Ghorbani and J. Y. Zou (2020) Neuron shapley: discovering the responsible neurons. Advances in Neural Information Processing Systems 33, pp. 5922–5932. Cited by: §2.1.
  • D. B. Gillies (1953) Some theorems on n-person games. Princeton University. Cited by: §1, §1, §2.1.
  • J. Henrich (2015) The secret of our success. In The Secret of Our Success, Cited by: §1.
  • M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al. (2019)

    Human-level performance in 3d multiplayer games with population-based reinforcement learning

    Science 364 (6443), pp. 859–865. Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §D.1, §D.1.
  • D. Leech (2002a) Computation of power indices. Cited by: §2.2.
  • D. Leech (2002b) Designing the voting system for the council of the european union. Public Choice 113 (3), pp. 437–464. Cited by: Table 7, Appendix L.
  • J. Lorraine, J. Parker-Holder, P. Vicol, A. Pacchiano, L. Metz, T. Kachman, and J. Foerster (2021a) Using bifurcations for diversity in differentiable games. Cited by: §1.
  • J. Lorraine, P. Vicol, J. Parker-Holder, T. Kachman, L. Metz, and J. Foerster (2021b) Lyapunov exponents for diversity in differentiable games. arXiv preprint arXiv:2112.14570. Cited by: §1.
  • R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30. Cited by: §1.
  • [35] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. Lee From local explanations to global understanding with explainable ai for trees. Nature machine intelligence 2 (1), pp. 56–67. Cited by: §2.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30. Cited by: item 3, §D.2, §1, §1, §1, §1, §2.1, §2.2, §2, §3.2.
  • S. Maleki, L. Tran-Thanh, G. Hines, T. Rahwan, and A. Rogers (2013) Bounding the estimation error of sampling-based shapley value approximation. arXiv preprint arXiv:1306.4265. Cited by: §2.2.
  • I. Mann and L. S. Shapley (1962) Values of large games. 6: evaluating the electoral college exactly. Technical report RAND CORP SANTA MONICA CA. Cited by: Figure 1, §1, §1, §2.1.
  • M. Maschler, B. Peleg, and L. S. Shapley (1979) Geometric properties of the kernel, nucleolus, and related solution concepts. Mathematics of operations research 4 (4), pp. 303–338. Cited by: §1, §2.1.
  • L. Metz, C. D. Freeman, S. S. Schoenholz, and T. Kachman (2021) Gradients are not all you need. arXiv preprint arXiv:2111.05803. Cited by: §1.
  • F. Mirzaei-Nodoushan, O. Bozorg-Haddad, and H. A. Loáiciga (2022) Evaluation of cooperative and non-cooperative game theoretic approaches for water allocation of transboundary rivers. Scientific Reports 12 (1), pp. 1–11. Cited by: Appendix M.
  • S. Moro, P. Cortez, and P. Rita (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62, pp. 22–31. Cited by: Table 4, §4.2.
  • V. Nair and G. E. Hinton (2010) Rectified linear units improve restricted boltzmann machines. In Icml, Cited by: §D.1, §D.1.
  • N. Ohta, V. Conitzer, Y. Satoh, A. Iwasaki, and M. Yokoo (2008) Anonymity-proof shapley value: extending shapley value for coalitional games in open environments. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pp. 927–934. Cited by: §5.
  • M. J. Osborne and A. Rubinstein (1994) A course in game theory. MIT press. Cited by: §2.
  • R. Patel, M. Garnelo, I. Gemp, C. Dyer, and Y. Bachrach (2021) Game-theoretic vocabulary selection via the shapley value and banzhaf index. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2789–2798. Cited by: Appendix I.
  • T. Pino (2018) Melbourne housing market data. Kaggle. Cited by: Table 4, §4.2.
  • L. S. Shapley (1953) A value for n-person games. aw kuhn, hw tucker, ed., contributions to the thoery of games ii. Princeton University Press. Cited by: §1, §2.1.
  • R. H. L. Sim, Y. Zhang, M. C. Chan, and B. K. H. Low (2020) Collaborative machine learning with incentive-aware model rewards. In International Conference on Machine Learning, pp. 8927–8936. Cited by: §1.
  • P. Stone and M. Veloso (2000) Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 8 (3), pp. 345–383. Cited by: §1.
  • P. Straffin jr (1988)

    “The shapley-shubik and banzhaf power indices as probabilities

    The Shapley Value: Essays in Honor of Lloyd S. Shapley, pp. 71–81. Cited by: §2.1.
  • M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. Cited by: §1, §2.2.
  • M. Vázquez-Brage, A. van den Nouweland, and I. Garcıa-Jurado (1997) Owen’s coalitional value and aircraft landing fees. Mathematical Social Sciences 34 (3), pp. 273–286. Cited by: Appendix M.
  • O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, et al. (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575 (7782), pp. 350–354. Cited by: §1.
  • S. Wang (2021)

    Efficient deep learning

    Nature Computational Science 1 (3), pp. 181–182. Cited by: §2.1.
  • T. Yan and A. D. Procaccia (2021) If you like shapley then you’ll love the core. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 5751–5759. Cited by: §2.
  • H. P. Young (1994) Cost allocation. Handbook of game theory with economic applications 2, pp. 1193–1235. Cited by: Appendix M.
  • M. Zuckerman, P. Faliszewski, Y. Bachrach, and E. Elkind (2012) Manipulating the quota in weighted voting games. Artificial Intelligence 180, pp. 1–19. Cited by: §4.1.1.


Appendix A Notation

The following notation is used throughout this paper:

  • the number of players in the game

  • the number of games

  • the maximum sequence length in variable feedforward networks

  • the quota, threshold that determines when a coalition can complete the goal or task

  • the weight of a player

  • , the normalized weight of a player

  • the grand coalition

  • a coalition (subset of players)

  • the minimal set of winning coalitions: the coalitions where each player is pivotal

  • feature matrix

  • solution matrix

  • the allocated reward or payoff to a player

  • the least core value (LCV), i.e. joint payoff to sacrifice such that there exists a feasible payoff vector in the least core

Appendix B Motivating The Use Of The Shapley Value To Analyze Influence In Weighted Voting Games

We provide an intuitive example motivating the use of Shapley values for measuring the true influence of participants, in weighted voting games or political power in decision making bodies.

Consider a parliament with 100 seats, and imagine that following the elections we have two large parties of 49 seats each, and one tiny party who has only managed to get 2 seats. However, note that having a majority means controlling a majority of the seats, i.e. more than 50 seats. This means that neither of the two big parties has a majority on its own (and of course the small party hasn’t got a majority on its own).

Hence, to achieve a required majority and form a government or pass a bill, the parties have to work in teams. Clearly, a coalition consisting of the two big parties has a majority, as together they have 49+49=98 seats out of the total of 100. However, a coalition consisting of any of the two big parties and the one small party also has 49+2=51 seats, which is also a majority.

Hence, in the setting discussed above, any two parties form a winning coalition that has a majority and is able to form a government or pass a bill. This can be modelled as a weighted voting game, where the weights are and the quota is .

Intuitively, as any two parties or more are a winning team and as any single party is a losing team, we might say that the weights are irrelevant. Although the big parties each have almost 25 times the seats as the small party, they all are symmetric in their opportunities to form a winning coalition, and one could claim they actually have equal political power. Clearly, the small party is in a very strong negotiation position, and could demand for instance to control a significant proportion of the joint budget.

In the above weighted voting game, the Shapley value of all agents would be equal (each getting a value of ), as it considers the marginal contributions of the parties over all permutations of players. Due to the symmetry between the parties, they have identical marginal contributions.

The above specific example illustrates some of the principles behind the fairness axioms that the Shapley value exhibits, and illustrates its importance for measuring influence in decision making bodies.

Appendix C Algorithm For Generating Least Core Solutions

The least core solutions can be generated by considering all possible coalitions. A simple improvement, however, is to use the minimal set of winning coalitions Deegan and Packel [1978] instead. In this section, we describe the idea behind this approach and the algorithm used in this paper.

Let denote the set of all players, let denote the quota and let denote the weights for player , for all . The characteristic function is given by , for coalitions .

The naive approach to find the least core solutions is by considering all winning coalitions defined as


and to find a least core solution by solving LP-problem 0(a).

(a) Naive approach
(b) Minimal approach
Figure C.1: Algorithms for finding least core solutions.

In the LP-problem (0(a)), all winning coalitions are considered, Inside those winning coalitions there exists a subset, that could branch out and form a smaller coalition. This yields a higher payoff for the players in the sub-coalition and is hence a Pareto improvement for the ”decision-makers”. Therefore, we can instead only consider the subset of minimal winning coalitions, given by


and solve the more efficient LP-problem 0(b).

Appendix D Experimental details

In this section, we elaborate on the experimental procedures, model architectures and hyperparameters used in our experiments.

Code and notebook.

Following the guidelines, we provide python code to reproduce our results, a pedagogical notebook (called CooperativeGameTheoryPrimer.ipynb) that introduces the most important game theory formalisms and the necessary data in the accompanied zip file.

d.1 Weighted Voting Games

Optimization procedure fixed-size feedforward networks.

For each -player dataset we perform independent runs with a maximum of epochs and an early-stopping criteria which breaks a run if, after a baseline of 500 epochs, the validation loss does not improve for 75 consecutive epochs. For each run, 70 percent is allocated for training and 30 percent for validation. After training, we select the model with the best validation loss. The selected models are stored and used for evaluation. In all experiments we use the Adam Kingma and Ba [2014]

optimizer, ReLU

Nair and Hinton [2010]activation functions for the hidden layers, Softmax functions for output layer of the payoffs and a Sigmoid for the output layer of epsilon.

Optimization procedure var-size feedforward network.

We perform independent runs for one variable dataset with a maximum of number of epochs and an early-stopping criteria which breaks a run if, after a baseline of 500 epochs, the validation loss does not improve for 75 consecutive epochs. For each run, 70 percent is allocated for training and 30 percent for validation. After training, we select the model with the best validation loss, which is stored and used for evaluation. In all experiments we use the Adam Kingma and Ba [2014] optimizer, ReLU Nair and Hinton [2010] activation functions for the hidden layers, Softmax functions for output layer of the payoffs and a Sigmoid for the output layer of epsilon.


All results are generated with multi-layer perceptrons (feedforward networks) of varying hidden-layer size and width. Further hyperparameters are the dropout rate, weight decay in the Adam optimizer, and the learning rate. Our standard model architectures are depicted in  Figure

E.1 Right.

Approximated ground-truth Shapley values.

To obtain the ground-truth solutions for Shapley values above nine players, we perform Monte-Carlo sampling to obtain a subset of all possible permutations. For each of the resamples we randomly sample permutations and average across the resamples to obtain approximations of the ground-truth. To validate that the approximated labels are representative we compute the Mean MAE between the true and approximated Shapley values for up to players with 1000 games per -player game. Across all -player games the mean MAE between the true and approximated Shapley values was and never exceeded . Figure D.2 shows the distributions per player together with the true label (red dot) for 8 and 9 player games.

Parameterizations for test datasets.

Figure D.1 shows the five standardized test distributions that are generated with the parameters in Table 2.

Figure D.1: Test data distributions in perspective. To assess how robust models’ are to shifts in the weight distribution, we re-parameterize the weight distribution in five different ways.
Test set name location
In-sample 1 1 1
Out-of-sample 1 1 2.5N
Slightly out-of-distribution 8 12 2
Moderately out-of-distribution 7 1.5 1.5N
Considerably out-of-distribution 12 8 3N
Table 2:

Test distributions are generated by the following parameterizations of the beta distribution

Figure D.2: Approximating ground-truth Shapley values. Ten randomly sampled games with distribution for each player in the game, the red dot indicates the true Shapley value (i.e. when computing the marginal value for each permutation).

d.2 Explainable AI

We provide a step-wise explanation of how we performed the experiment in our explainable AI section, that is, train a neural network to predict the Shapley values for a dataset of features:

  1. Select a dataset. The first step is to select a dataset of choice. In our example, we end up with a feature matrix where each row is an observation and each column a feature, and a target vector or depending on the task.

  2. Feature to target. Following, we estimate a model that predicts one output for each set of features.

  3. Compute Shapley values through sampling. We use the trained model from step 2 to obtain Shapley values with the SHAP package KernelExplainerLundberg and Lee [2017], which is based on LIME Lundberg and Lee [2017]. This yiels a matrix of Shapley values, one for each feature in every data instance: .

  4. Feature to Shapley. Finally, we create an array of linearly spaced train-test splits . Each element in indicates the ratio of samples that are used for training the neural network, with the remaining samples used for evaluation. For each train-test partition in

    then, we train a feed forward neural network on

    samples by minimizing the MSE between the predicted and actual Shapley values (full details in Table 3) for 100 epochs, and evaluate on the remainder (unseen) samples.

Datasets and preprocessing.

Table 4

summarizes the considered datasets. The UCI Banking dataset contains 15 features and a binary target (whether the client has subscribed a term deposit), whereas the Melbourne housing dataset contains 21 features and a regression target (the price of a house). We exclude the “duration” feature in the Banking dataset to prevent data leakage, as is recommended in the documentation. Moreover, we exclude the “Rooms” feature as there is a lot of overlap in this and the “Bedroom2” feature. The numerical values in both datasets are z-scored, and categorical features are encoded as numbers. Missing values are encoded as zeros, so that the complete dataset can be used for our experiment.

Feature to target.

We partition the dataset into train and test and train a Random Forest Classification model on the UCI Banking set, obtaining a final test accuracy of 72 %. Moreover, we train a decision tree regression model to predict the price given the set of house features and obtain a final MSE of 1.21 on the test dataset.

Compute Shapley values.

We use the trained models to obtain Shapley values for each instance in the feature dataset using the SHAP kernel explainer object Lundberg and Lee [2017].

Feature to Shapley: procedure.

Finally, we can train a neural network using the datasets . To investigate the sample complexity required to learn a representative mapping from features to Shapley values, we conduct the following experiment. First, we create an array of linearly spaced training splits . Each element in in indicates the ratio of samples that are used for training the neural network, leaving the remaining samples for evaluation. For each train/test partition in then, train a feed forward neural network on samples through minimizing the MSE between the predicted and actual Shapley values (architecture and hyperparameters specified in Table 3) for 100 epochs, and evaluate on the samples. Figure 6 depicts the resulting RMSE for both datasets as a function of the training fraction.

Number of layers 3
Hidden size 128
Adam learning rate 1e-4
Adam 1e-5
Dropout rate 0.1
Table 3: Hyperparameters used for XAI experiments.
Dataset Number of features Number of observations
Pre Post Pre Post
UCI Banking Moro et al. [2014] 15 14 11162 11162
Melbourne Housing Pino [2018] 20 20 34857 34857
Table 4: Number of features pre and post processing for each considered dataset.

Appendix E Additional Methods

Figure E.1: Neural architectures for the variable-size game predictions. We add zero-padding to allow one model to predict solutions of games of variable sizes. We choose a maximum length , and add zero-padding to games with less than players. Full details on our data in Section 3.1.2
Figure E.2: Procedurally generating and solving one weighted voting game. We sample the weights and quota to create a weighted voting game. The model inputs are divided by the quota and the payoffs are computed by solving the game, provided a solution concept. One game instance consists of a set of normalized weights and the allocated payoffs .

Appendix F Additional Results

In-sample predictive performance.

For completeness, we provide the MAEs for each solution concepts on the in sample dataset up to players in Table 5.

Least core payoffs Least core excess Shapley values Banzhaf indices
Metric Mean MAE MAE Mean MAE Mean MAE
N Fixed Variable Fixed Variable Fixed Variable Fixed Variable
4 0.087 0.140 0.034 0.058 0.069 0.072 0.067 0.097
5 0.077 0.096 0.031 0.045 0.040 0.047 0.035 0.060
6 0.058 0.069 0.028 0.033 0.026 0.031 0.021 0.037
7 0.040 0.051 0.021 0.027 0.022 0.023 0.014 0.024
8 0.027 0.036 0.015 0.021 0.016 0.018 0.015 0.017
9 0.019 0.031 0.012 0.019 0.010 0.015 0.012 0.014
10 0.014 0.023 0.007 0.017 0.009 0.013 0.006 0.013
11 0.010 0.021 0.006 0.020 0.006 0.012 0.009 0.013
12 0.009 0.018 0.005 0.023 0.005 0.011 0.004 0.013
13 0.006 0.016 0.003 0.027 0.009 0.010 0.004 0.014
14 0.005 0.014 0.003 0.032 0.004 0.009 0.009 0.014
15 0.005 0.013 0.003 0.036 0.004 0.009 0.012 0.014
Avg 0.019 0.015 0.029 0.051 0.018 0.022 0.017 0.027
Table 5: Predictive performance across solution concepts on in-sample test dataset.
Capturing step-wise jumps: full description.

We provide a more detailed description to Figure 5. In the Figure on the left, each player is a winning coalition by itself when . Up to , player 3 is more critical than players 1 and 2 and receives a larger share of the joint payoffs. All players obtain equal payoffs when the only winning coalition is the grand coalition. On the right side: each player is a winning coalition by itself when , which results in an equal payoff of for each player. Increasing the quota continuously alters the set of winning coalitions, thereby the relative importance of each player. At players 1, 2 and 3 have become less critical and obtain a payoff of each compared to for player 4 and 5. Finally, when , the only winning coalition that can be formed is the grand coalition: each player is required and the joint payoff is divided equally.

Appendix G Hardware details

We trained our neural networks on an internal compute cluster with 10 Quadro RTX 6000 GPUs and two Intel(R) Xeon(R) Gold 5218 CPUs, each of which has 16 physical cores (32 logical processors).

Appendix H Asset licensing

Two main assets were used in this work:

  • The UCI Banking dataset

  • The Melbourne Housing dataset

The Melbourne Housing dataset was released under the license CC BY-NC-SA 4.0. The UCI Banking dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Appendix I Evaluating the Speedup Achieved by Our Approach, Depending on the Size of the Dataset

Many domains involve datasets with a large number of features, sometimes with over thousands or millions of data instances. A prominent example is language modelling, where large bodies of text are generated, translated or processed in different manners Patel et al. [2021]. In such cases, computing the Shapley value for each data instance using sampling-based methods takes an extremely long time.

Our approach is targeted at speeding up feature importance computations of many instances. We perform a simple experiment to demonstrate the effectiveness of a model-based approach for explaining model decisions. We take the Melbourne Housing dataset and obtain 9000 instances by 13 features after preprocessing steps (encoding the categorical features and standardization). First, we generate a ground-truth dataset by setting the number of samples in the SHAP package (KernelExplainer) to 5000 samples. The number of samples in the KernelExplainer determines the number of times a model is re-evaluated for every prediction. A higher number of samples will result in more accurate (and lower variance in the) Shapley values, but require a longer compute time. Computing the Shapley values on our 9000 samples takes 4.13 hours. Figure I.1 (left) shows that the inference time scales approximately linearly with the number of re-evaluation samples.

To contrast the difference in computation time, we compute Shapley values on 10 % of the data (computation time is 29 minutes) train the neural network in 30 seconds, and predict Shapley values on the remaining 90% of the data (less than a second). All together, this procedure takes roughly 12 % of the time of it took to compute Shapley values for the entire dataset. This speeds-up the whole procedure by 8x while keeping a reasonable prediction quality on the remaining unseen data (Figure I.1 right).

Next, we use SHAP again to compute Shapley values for 90% of the data,   instances, this time with the default number of model re-evaluations samples, yielding slightly accurate Shapley values (the default is ). We quantify the trade-off in prediction quality as the by comparing the model predicted Shapley values to the Shapley values obtained with the default number of samples.

Concretely, we measure the Mean Mean Absolute Error (MMSE) of our model predictions and the SHAP package to the ground-truth dataset. We note that speedup does come at a slightly reduced empirical performance: the model MSE on the test set is, , compared to the MSE with SHAP: . However, we expect that hyperparameter tuning and better preprocessing will improve the quality of the model predictions.

All together, it is easy to see how this win in computation time translates to larger datasets. Consider a similar dataset with 100,000 or a million entries. Computing Shapley values on the entire dataset with SHAP will take weeks (40 or 400 hours), while the time for our procedure remains relatively the same: building the train set and training the neural network only takes a fraction of the time required by the sampling-based approach: the analysis using our method would take a few hours.

Figure I.1: Speeding up Shapley value computations. Left: The inference time, here the time it takes to compute the shapley values of 9000 instances, scales approximately linearly with the number of samples. Right: The x times speedup provided by our model-based approach contrasted by the fraction of data used for training.

Appendix J Heuristics and Simpler Models

We now consider a common heuristic for payoff allocation in weighted voting games, the

weight proportional payoff allocation. We use this as a benchmark to evaluate our models’ performance against. We find that our fixed models outperform this benchmark heuristic across solution concepts and test sets.

WeightProportional baseline.

The WeightProportional (WP) heuristic divides the payoffs in a way that is directly proportional to the players’ weights, that is, the payoff of a player is:


for any player game. WP provides meaningful benchmark as it is the most naive method for allocating payoffs. The resulting payoffs (and thus errors) do not take into account the quota, which may have a major impact on the correct solutions by means of the emergent set of winning coalitions. As such, any improvement on this benchmark thus suggests a better understanding of the collaborative structures in the game at hand.

We also compare the predictive performance of our models against the predictive performance of multinomial regression models. These models have just one layer with inputs and outputs for each player game and a softmax nonlinearity at the end. Across all game sizes, our models outperform the linear models by 45 % to 95 % percent. Results are found in Table 6.

Linear Core Neural net Core Linear Shapley Neural net Shapley Linear Banzhaf Neural net Banzhaf
4 0.212 0.089 0.137 0.07 0.116 0.069
5 0.147 0.078 0.098 0.041 0.083 0.035
6 0.124 0.059 0.080 0.026 0.071 0.021
7 0.108 0.041 0.070 0.022 0.063 0.014
8 0.102 0.028 0.065 0.015 0.059 0.014
9 0.089 0.020 0.056 0.009 0.051 0.012
10 0.078 0.014 0.050 0.008 0.045 0.006
11 0.072 0.010 0.045 0.006 0.042 0.009
12 0.065 0.009 0.041 0.005 0.041 0.004
13 0.062 0.006 0.036 0.009 0.035 0.004
14 0.056 0.005 0.036 0.004 0.034 0.009
15 0.057 0.005 0.032 0.004 0.032 0.0012
16 0.051 0.003 0.029 0.004 0.030 0.003
17 0.051 0.003 0.028 0.004 0.029 0.006
18 0.047 0.005 0.028 0.007 0.027 0.011
19 0.045 0.002 0.026 0.007 0.025 0.003
20 0.044 0.003 0.025 0.006 0.024 0.006
Table 6:

Comparing neural network performance to multinomial logistic regression models.

Appendix K Hard cases

Across solution concepts (Shapley, Banzhaf and Least core), discontinuous jumps in the solution space yield large errors (see Section 4.1.2 and Figures 5, L.4). In this section, we highlight failure cases for the Least core. In contrast to the power indices, a solution in the least core is obtained by solving a Linear Program under several hard constraints. Thus, a solution to a weighted voting game is only feasible when, for each winning coalition , we have that


where is the payoff of player in coalition . In words, each winning coalition must obtain joint reward that is at least as good as what they would have obtained on their own. If this criteria is not met, the solution is infeasible: some players will have an incentive to deviate from the grand coalition.

Majority of the model predictions fail to meet hard constraints. Figure K.1 depicts the percentage of feasible solutions over the game sizes. For small player games (), between 50 and 5.5 % of the model predictions are feasible). The larger the number of players, the fewer model predictions satisfy the hard constraints. From eight player games and above, the number of feasible predictions ranges between 0.2 and 10 %. A preliminary analysis shows no relationship between infeasible solutions and larger errors in the payoffs nor the LCV.

The fact that the model is not able to predict many feasible solutions in these large games is unsurprising. The set of winning coalitions increases approximately exponentially in . As such, the more players in a game, the more hard constraints need to be satisfied in order for a solution to be feasible per game. All hard constraints need to be met in order for a solution to be feasible. We expect that including a penalty term for infeasible predictions during training will help to improve this metric and hope to address this in future work.

Figure K.1: Percentage of feasible solutions for each player game. We count the number of solutions that meet the hard constraints by validating equation K.1 for each winning coalition in the set of winning coalitions across 1000 samples per player game.
When the least core is a set, simplex based methods tend to find corner solutions, which leads to ambiguity in the solutions.

Further analysis of individual weighted voting games reveals that the most problematic games are cases where the least core contains multiple correct solutions, but the full joint payoff is allocated to a single player. In other words, payoffs are divided such that one player receives one, and all the others zero. To illustrate what is happening here, consider a simple four player game that produced the largest error (Figure K.2). We focus our analysis on the game on the left, but our insights apply to a wide range of -player games in the dataset. This particular game has the following weights and quota (rounded to one decimal place for simplicity)


where we can see that no player is a winning coalition by itself (all ’s q). In fact, the only winning coalition in this game is the grand coalition, that is,


The solution to this weighted voting game is:


where player one receives the full joint payoff, even while its’ weight is smaller than the weight of player three (). The reason why we obtain this solution comes down to two facts. First of all, the least core is a set, and can therefore contain multiple correct solutions. Secondly, simplex based methods return corner solutions. Thus, in cases where several players are required to form a winning coalition, the solver will allocate the joint payoff to an arbitrary player that belongs to this critical subset.

Figure K.2: Examples of games where the joint payoff is allocated to a single player. We depict the actual solutions (green dots), payoffs allocated according to the weight proportional heuristic (see eq. J, orange squares) and model predictions (blue triangles).

Figure K.3 depicts this visually for a simple two player game. Here, we have the WVG with , so the only winning coalition is the grand coalition. Solving this game results in either of the corner solutions: the payoff vector or with a LCV . However, if we define as the solution of a player one, any combination is a feasible solution in the least core.

Figure K.3: Visual depiction of the corner solutions and feasible region of a two-player weighted voting game. Take the weighted voting game with . Simplex based methods tend to return corner solutions, in this case or with a LCV .

Obtaining the full set of solutions for this game can be done empirically. In principle, we can take the payoff vector and LCV above, generate the set of permutations of length two (player pairs) and check whether the solution is still feasible if we give a small value from one player (whose payoff is not zero) to any other player. However, this approach does not scale to larger games. It would be more efficient to derive rules for acquiring the full least core set from a single solution.

We limited our analysis above to four-player games. However, we observe the above behavior irrespective of the number of players in the game. Therefore, addressing this issue is critical for improving the predictive performance of our models.

Appendix L Applying Our Models to Voting In The European Council

In this section we apply our trained models to predict voting power of member states in the EU Council. First, we take the weights from four countries in the EU voting council Leech [2002b], see Table 7, and define the quota as the majority vote (50 % of the total weights + 1). Our (weighted) voting game consists of the member states Hungary, Netherlands, Poland and Ireland with the weights , and quota .

Member state Population Population Perc. Weights Weights Perc.
Germany 82.54m 16.5% 29 8.4%
France 59.64m 12.9% 29 8.4%
UK 59.33m 12.4% 29 8.4%
Italy 57.32m 12.0% 29 8.4%
Spain 41.55m 9.0% 27 7.8%
Poland 38.22m 7.6% 27 7.8%
Romania 21.77m 4.3% 14 4.1%
Netherlands 17,02m 3.3% 13 3.8%
Greece 11.01m 2.2% 12 3.5%
Portugal 10.41m 2.1% 12 3.5%
Belgium 10.36m 2.1% 12 3.5%
Czech Rep. 10.20m 2.1% 12 3.5%
Hungary 10.14m 2.0% 12 3.5%
Sweden 8.94m 1.9% 10 2.9%
Austria 8.08m 1.7% 10 2.9%
Bulgaria 7.85m 1.5% 10 2.9%
Denmark 5.38m 1.1% 7 2.0%
Slovakia 5.38m 1.1% 7 2.0%
Finland 5.21m 1.1% 7 2.0%
Ireland 3.96m 0.9% 7 2.0%
Table 7: Comparison of voting weights in the Council of the European Union Leech [2002b].

Following, we divide the weights by the quota and feed the normalized weights (see Section 3.1). We contrast the model predictions against the ground-truth solutions and find that the Mean Mean Absolute Error (MMAE) between our model predictions and actual solutions per member state is 0.003 (Banzhaf), 0.002 (Shapley), 0.017 and 0.007 for the Least core payoffs and excess respectively. Figure L.1 compared the payoff allocations for the Shapley, Banzhaf and Core solution concepts. We observe that the contribution measured by the Power indices (Shapley and Banzhaf) is less than the weight, while the Core attributes a larger share of the joint payoff to this member state.

Figure L.1: Allocating payoffs to member states in the EU Council. A comparison of how absolute power (the normalized weights ()) relates to the allocated share of the joint payoff.

Next, we perform a sensitivity analysis. In such an analysis, we are interested in the interdependence of member states’ voting power. Holding all other game parameters constant, we increment the weight of Hungary () with values of one, until it exceeds the quota of 30,5. Figure L.2 depicts how the member states’ voting power (in Shapley value) changes as Hungary gains absolute power (as we increase the weight). The predicted payoff for Hungary is depicted by the red triangles, the model predictions for the other states in blue, and the actual solutions by the green dots. We observe that there are two transition points in the solution space. One transition point occurs when . Hungary can now form a winning coalition both with the Netherlands () and with Poland (), which raises its power in the voting and accordingly its’ share of the joint payoff. We notice that our model is able to predict accurate payoffs overall, with a spike in absolute error per player at this transition point.

The second transition point occurs when . Hungary is now also a winning coalition by itself, thereby obtaining a larger share of the joint payoff. We observe that its’ Shapley value increases to approximately 0.6, obtaining the more than half of the total reward. Our model is able to capture this transition almost perfectly (MMAE ).

We perform a similar analysis, this time perturbing the quota. Figure L.3 depicts the same game with the original weights fixed, increasing the value of the quota from small to the sum of the weights (notice that this is the same experiment we performed in Section 4.1.2). The value of the quota determines the set of winning coalitions, thereby dictating the payoff of each player. As such, perturbing in this game yields a large number of discontinuous jumps in the member states solutions compared to our previous analysis perturbing the weight of Hungary. Our model is able to capture the overall changes in the solutions but has difficulty with capturing large jumps in payoffs for small quota increments.

Figure L.2: Perturbing the weight of a member state in a voting game. Top: Model predictions and actual payoffs for a four-player voting game with weight vector . textitBottom: Absolute errors between the individual member state payoffs.
Figure L.3: Perturbing the quota in a voting game. Top: Model predictions and actual payoffs for a four-player voting game with weight vector . Bottom: Absolute errors between the individual member state payoffs. Note that these are a large number of transition points in the solutions.

Our variable models models have been trained on games up to ten players and can be applied to instantly predict the payoffs of member states in larger councils. To illustrate this, we take all member states in Table 7 and use our variable models to predict the payoffs for each, again setting the quota as the majority vote. We report the prediction quality of our models with respect to the ground truth (in Mean Mean Absolute Error), which is 0.013 for Shapley, 0.004 for Banzhaf, and 0.05 for the least core. Figure L.4 compares the solutions across member states and solution concepts.

Figure L.4: Allocating payoffs to member states in the EU council with variable models. A comparison of the allocated payoffs over solution concepts.

While there is room for improvement in terms of prediction quality, our approach provides a useful tool for obtaining quick estimates of the voting power in medium size games (). Furthermore, while approximation methods exist for Shapley and Banzhaf, there are currently no effective methods for estimating payoffs in the least core. This solution concept is more challenging as the solutions are obtained by solving a Linear Program with a hard constraints. Thus, solving a game with 15 or even 20 players contains on the order of or constraints, which slow down the solvers dramatically. Here, we can apply the same technique, solve for the small/medium size-games, then leverage our neural network approach to tackle a higher numbers of players. In other words, provided a solver that can handle up to players, we can train a neural network on this number of players, allowing us to extend the number of players even beyond , possibly up to .

Appendix M Real World Applications of Weighted Voting Games

Weighted voting games, and cooperative game theory in general, provide valuable tools for reasoning about the allocation of collective outcomes to individual agents. As such, their applicability span numerous domains. In this section, we highlight several real-world applications of our work. In all examples, our framework can be applied to both speed up and enhance the scalability. Dinar et al. [1992] utilize solution concepts from cooperative game theory – including the Core, the Shapley Value, the Nucleolus and Nash – to allocate water resources to competing parties in the presence of scarcity. More recently, Mirzaei-Nodoushan et al. [2022] also performed an extensive evaluation of both cooperative and non-cooperative game theoretic approach for water management in trans-boundary river basins.

Moreover, cooperative game theory has proven useful in analyzing the costs of joint activities Young [1994]. Vázquez-Brage et al. [1997] combine a model of airport games with the Shapley value to fairly distribute the aircraft landing fees among airlines. When an airport is build, the size of the runway is chosen to match the largest designed airplaine. The authors address how to fairly divide costs between airlines provided the different designs and sizes of their planes.

Bistaffa et al. [2017] applied cooperative game theory to analyze the social ridesharing problem. Mobility platforms such as Lyft allow their users to share their trip with nearby users in real-time, providing a more environmentally friendly and cost effective alternative to commuting alone. They study how the costs of the ride should be divided among passengers. Another application is in supply chain management, where Fiestras-Janeiro et al. [2011] study how to coordinate actions between suppliers, manufactorers, retailers and customers.