1 Introduction
The design of algorithms for strategic settings has been a central problem in Artificial Intelligence for several years, with the aim of developing agents capable of behaving optimally when facing strategic opponents. Many efforts have been made for 2player games, e.g., finding a
Nash equilibrium [Lemke and Howson, Jr1964, Gatti et al.2012] and, more recently, finding a Stackelberg equilibrium [Conitzer and Sandholm2006]. The study of this latter problem paved the way to the field of Security Games, which is, nowadays, one of the application fields of noncooperative game theory with the highest social impact
[Tambe2011].Fewer results are known, instead, about games with more than 2 players—except for games with particular structures, e.g., congestion and polymatrix games [Nisan et al.2007]. An interesting class of games widely unexplored in the literature is that one of adversarial team games. Here, a team of players with the same utilities faces an adversary [von Stengel and Koller1997]. These games can model many realistic security scenarios and can provide tools to coordinate teammates acting strategically. Basilico et al. basilico2016 study the inefficiency a team can incur in normalform games when teammates cannot correlate their strategies w.r.t. when they can. They also provide algorithms to approximate the Teammaxmin equilibrium—introduced by von Stengel and Koller vonStengelKoller1997—that is the optimal solution when correlation is not possible. Furthermore, it is known that finding a Teammaxmin equilibrium is FNPhard and inapproximable in additive sense [Hansen et al.2008, Borgs et al.2010].
When extensiveform games enter the picture, adversarial team games become much more intriguing, various forms of correlation being possible. Nevertheless, to the best of our knowledge, this game class is still unexplored in the literature. In the present paper, we focus on three main forms of correlation [Forges1986]. In the first, preplay and intraplay communication is possible, corresponding to the case in which a communication device receives inputs from the teammates about the information they observe during the play, and sends them recommendations about the action to play at each information set. In the second, only preplay communication among the teammates is possible, corresponding to the case in which a correlation device communicates a plan of actions to each teammate before playing the game.^{1}^{1}1With only preplay correlation, three solution concepts are known: NormalForm, ExtensiveForm, and AgentForm Correlated Equilibrium. The spaces of players’ strategies achievable with the three correlation devices are the same, while the players’ incentive constraints are different (even if it is not known whether the spaces of the outcomes for the three equilibria in adversarial team games are different). The complexity of computing the equilibrium maximizing the team’s utility in adversarial team games with at least 2 teammates is NPhard for all the three equilibria [von Stengel and Forges2008]. In our paper, we focus on the first one. Finally, in the third, no communication is possible^{2}^{2}2This setting is frequent in practice. Consider, for example, a security problem where a set of defensive resources from different security agencies are allocated to protect an environment at risk but, due to organizational constraints, they are not able to share information with each other. The resources have the same goal (i.e., to secure the environment) but cannot coordinate strategy execution. The same happens when a set of resources has to operate in stealth mode..
Original contributions. We formally define game models capturing the three aforementioned cases and the most suitable solution concepts (trivially, the Teammaxmin equilibrium in the third setting). Furthermore, we define three inefficiency indices for the equilibria, capturing: the inefficiency caused by using a correlation device in place of a communication device, the inefficiency caused by not using any form of communication in place of a communication device, and the inefficiency caused by not using any form of communication in place of a correlation device. We provide lower bounds to the worstcase values of the inefficiency indices, showing that they can be arbitrarily large.
Furthermore, we study the computational complexity of the problems of finding the three equilibria with the different forms of correlation, and we design, for each equilibrium, an exact algorithm. We show that when a communication device is available, an equilibrium can be computed in polynomial time (even in the number of players) by a 2stage algorithm. In the first stage, the game is cast into an auxiliary 2player equivalent game, while, in the second stage, a solution is found by linear programming. When a correlation device is available, the problem can be easily shown to be
FNPhard. In this case, we prove that there is always an equilibrium with a small (linear) support, and we design an equilibriumfinding algorithm, based on a classical columngeneration approach, that does not need to enumerate an exponential space of actions before its execution. Our algorithm exploits an original hybrid representation of the game combining both normal and sequence forms. The columngeneration oracle is shown to deal with an APXhard problem, with an upper approximation bound decreasing exponentially in the depth of the tree. We also provide an approximation algorithm for the oracle that matches certain approximation guarantees on a subset of instances. When no communication is possible, the equilibriumfinding problem can be easily shown to be FNPhard. In this case, the problem can be formulated as a nonlinear programming problem and solved by resorting to global optimization tools.Finally, we empirically evaluate the scalability of our algorithms in random game instances. We also evaluate the inefficiency for the team of not adopting a communication device, showing that, differently from the theoretical worstcase bounds, the empirical inefficiency is extremely small.
2 Preliminaries
A perfectinformation extensiveform game [Shoham and LeytonBrown2009] is a tuple , where: is a set of players, is a set of actions, is the set of nonterminal decision nodes, is the set of terminal (leaf) nodes, is a function returning the player acting at a given decision node, is the action function—assigning to each choice node a set of available actions—, is the successor function, and is the set of utility functions in which specifies utilities over terminal nodes for player . Let be the inclusionwise maximal set of decision nodes such that, for all , . Then, an imperfectinformation extensiveform game is a tuple , where is an extensiveform game with perfect information and is the set of information sets, in which is a partition of such that, for any , whenever there exists a where and . As usual in game theory, we assume, for each , there is only one s.t. . We focus on games with perfect recall where, for each player and each , decision nodes belonging to share the same sequence of moves of player on their paths from the root.
The study of extensiveform games is commonly conducted under other equivalent representations. The normal form is a tabular representation in which player ’s actions are plans , specifying a move at each information set in , and player ’s utility is s.t. , where is the terminal node reached when playing plan profile . Basically, in the normalform representation, players decide their behavior in the whole game ex ante the play. The reduced normal form is obtained by deleting replicated strategies from the normal form. However, the size of the reduced normal form is exponential in the number of information sets. A mixed strategy of player
is a probability distribution on her set of pure strategies
. In the agent form—whose definition is omitted due to reasons of space, see [Selten1975]—, players play behavioral strategies, denoted by , each specifying a probability distribution over the actions available at information set of player . Two strategies, even of different representations, are realization equivalent if, for any strategy profile of the opponents, they induce the same probability distribution over the outcomes. In a finite perfectrecall game, any mixed strategy can be replaced by an equivalent behavioral one [Kuhn1953].Both normal and agent forms suffer from computational issues that can be overcome by using the sequence form [von Stengel1996], whose size is linear in the size of the game tree. A sequence for player , defined by a node of the game tree, is the subset of specifying player ’s actions on the path from the root to . We denote the set of sequences of player by , these are the sequenceform actions of player . A sequence is said terminal if, together with some sequences of the other players, leads to a terminal node. Moreover, we denote by the fictitious sequence leading to the root node and with the extended sequence obtained by appending to sequence . The sequenceform strategy, said realization plan, is a function associating each sequence with its probability of being played. A welldefined sequenceform strategy is such that, for each , , for each and sequence leading to , and . Constraints are linear in the number of sequences and can be written as , where is an opportune matrix and
is an opportune vector. The utility function of player
is represented as an dimensional matrix defined only for profiles of terminal sequences leading to a leaf. With a slight abuse of notation, we denote it by .A Nash equilibrium (NE), whose definition does not depend on the representation of the game, is a strategy profile in which no player can improve her utility by deviating from her strategy once fixed the strategies of all the other players.
3 ExtensiveForm Adversarial Team Games, Equilibria, and Inefficiency
We initially provide the formal definition of a team.
Definition 1 (Team)
Given an extensiveform game with imperfect information , a team is an inclusionwise maximal subset of players such that, for any , for all , .
We denote by the set and by the set of actions available at the information sets in . An extensiveform team game (EFTG) is a generic extensiveform game where at least one team is present. Von Stengel and Koller vonStengelKoller1997 analyze zerosum normalform games where a single team plays against an adversary. We extend this game model to the scenario of extensiveform games.
Definition 2 (StsaEfTg)
A zerosum singleteam singleadversary extensiveform team game (STSAEFTG) is a game in which:

, where set defines a team (as in Definition 1) and player is the adversary ();

for each it holds: , where denotes the utility of teammates and that one of the adversary.
When the teammates have no chance to correlate their strategies, the most appropriate solution concept is the Teammaxmin equilibrium (TME). Formally, the TME is defined as . By using the same arguments used by von Stengel and Koller vonStengelKoller1997 for the case of normalform games, it follows that also in extensiveform games a TME is unique except for degeneracy and it is the NE maximizing team’s expected utility. Nevertheless, in many scenarios, teammates may exploit higher correlation capabilities. While in normalform games these capabilities reduce to employing a correlation device as proposed by [Aumann1974], in extensiveform games we can distinguish different forms of correlation. More precisely, the strongest correlation is achieved when teammates can communicate both before and during the execution of the game (preplay and intraplay communication), exchanging their private information by exploiting a mediator that recommends actions to them. This setting can be modeled by resorting to a communication device defined in a similar way to [Forges1986]. A weaker correlation is achieved when teammates can communicate only before the play (preplay communication). This setting can be modeled by resorting to a correlation device analogous to that one for normalform games. We formally define these two devices as follows (as customary, denotes the simplex over ).
Definition 3 (Communication device)
A communication device is a triple where is the set of inputs (i.e., information sets) that teammates can communicate to the mediator, is the set of outputs (i.e., actions) that the mediator can recommend to the teammates, and is the recommendation function that associates each information set with a probability distribution over , as a function of information sets previously reported by teammates and of the actions recommended by the mediator in the past.
Definition 4 (Correlation device)
A correlation device is a pair . is the recommendation function which returns a probability distribution over the reduced joint plans of the teammates.
Notice that, while a communication device provides its recommendations drawing actions from probability distributions during the game, a correlation device does that only before the beginning of the game. Resorting to these definitions, we introduce the following solution concepts.
Definition 5 (Teammaxmin equilibrium variations)
Given a communication device—or a correlation device—for the team, a Teammaxmin equilibrium with communication device (TMECom)—or a Teammaxmin equilibrium with correlation device (TMECor)—is a Nash equilibrium in which all teammates follow their recommendations and, only for TMECom, report truthfully their information.
Notice that in our setting (i.e., zerosum games), both TMECom and TMECor maximize team’s utility. We state the following, whose proof is straightforward.
Property 1 (Strategy space)
The space of lotteries over the outcomes achievable by using a communication device includes that one of the lotteries achievable by using a correlation device, that, in its turn, includes the space of the lotteries achievable without any device.
Let , , be the utility of the team at, respectively, the TME, the TMECom and the TMECor. From the property above, we can easily derive the following.
Property 2 (Equilibria utility)
The game values obtained with the different solution concepts introduced above are such that .
In order to evaluate the inefficiency due to the impossibility of adopting a communication or correlation device, we resort to the concept of Price of Uncorrelation (), previously introduced in [Basilico et al.2017] as a measure of the inefficiency of the TME w.r.t. the TMECor in normalform games. In these games, the is defined as the ratio between the utility given by the TMECor and the utility given by the TME, once all the team’s payoffs are normalized in . For extensiveform games, we propose the following variations of the to capture all the possible combinations of different forms of correlation.
Definition 6 (Inefficiency indices)
, , .
In perfectinformation games all these indices assume a value of 1, the solution being unique unless degeneracy by backward induction. With imperfect information the indices can be larger than 1. In normalform games, the tight upper bound to is , where is the number of actions of each player and is the number of players [Basilico et al.2017]. Using a definition based on is not suitable for extensiveform games, where each player may have a different number of actions per node. Thus, we state the bounds in terms of (i.e., the number of terminal nodes). The following three examples provide lower bounds to the worstcase values of the indices, showing that the inefficiency may be arbitrarily large in . Initially, to ease the presentation, we define a specific type of team player that we call spy.
Definition 7 (Spy player)
Player is said to be a spy if, for each , and is a singleton.
A spy just observes the actual state of the game and her contribution to the play is only due to her communication capabilities. Notice that the introduction of a spy after decision nodes of the adversary does not affect the team’s utility in a TMECor (the team’s joint plans are the same) but improves the team’s capabilities, and final utility, in a TMECom.
Example 1 (Lower bound for worstcase )
Consider a STSAEFTG with players and actions for each player at every decision node except for the first team player, who is a spy. The game tree is structured as follows (see Figure 1 for the case with ).

The adversary plays first;

then the spy observes her move;

each one of the other teammates is assigned one of the following levels of the game tree and all her decision nodes are part of the same information set;

iff, for each and for each , the action chosen at is equal to the one selected by .
We have , and thus . Since the tree structure is such that we obtain . Once is fixed, the inefficiency is monotonically increasing in , but is upper bounded by (corresponding to the case in which each team player except the spy has the minimum number of actions, i.e., 2). It follows that, in the worst case w.r.t. , .
Example 2 (Lower bound for worstcase )
Consider a STSAEFTG with players and actions at each of their decision nodes, in which each level of the game tree is associated with one player and forms a unique information set. iff all the teammates choose the same action of the adversary, who plays first. This case corresponds to the worst case for in normalform games. Here we formulate the bound in terms of . We have and . It follows that . This time, and thus . The worst case w.r.t. is reached when and . Therefore, .
Example 3 (Lower bound for worstcase )
Consider the game presented in Example 1. Since and , it follows . The structure of the game tree is such that and thus . Notice that, in this case, the inefficiency is maximized when , which corresponds to having a team of two members. Thus, in the worst case w.r.t. , .
4 Finding a TMECom
We show that there is a polynomialtime TMEComfinding algorithm. Indeed, we prove that the problem of finding a TMECom is equivalent to finding a 2player maxmin strategy in an auxiliary 2player game with perfect recall and that the auxiliary game can be built in polynomial time.
First, we define the structure of the auxiliary game we use. Let be an extensiveform game and . We define the following functions. Function returns the sequence profile constituting the path from the root to a given node of the tree. Function is s.t., for each and each set of players ,
Intuitively, returns the unique profile of sequences of players in leading to when combined with some sequences of the players in .
The following definition describes the information structure of the auxiliary extensiveform game.
Definition 8 (observable game)
For any game and any set of players , the observable game is a tuple , where is such that:

for each decision node , there exists one and only one s.t. and where denotes the information set containing in ;

for each player , is the set with the lowest possible cardinality s.t. for each and for each pair of decision nodes , it holds:
In a observable extensiveform game, players belonging to are fully aware of the moves of other players in and share the same information on the moves taken by players in . We show that we can build in polynomial time.
Lemma 1 (observable game construction)
The Tobservable game of a generic STSAEFTG can be computed in polynomial time.
Proof. We provide the sketch of an algorithm (the pseudocode is provided in the Appendices) to build a observable game (i.e., a observable game with ) in time and space polynomial in the size of the game tree. The algorithm employs nested hashtables. The first hashtable associates each joint sequence of the team with another hashtable, which is indexed over information sets and has as value the information set id to be used in . is traversed in a depthfirst manner while keeping track of the sequence leading to the current node. For each s.t. , a search/insertion over the first hashtable is performed by hashing . Then, once the sequencespecific hashtable is found, the information set is assigned a new id if it is not already present as a key. is built by associating to each decision node of the team a new information set as specified in the hashtable. The worstcase running time is .
Theorem 2 (TMECom computation)
Given a STSAEFTG and a communication device for , the unique (unless degeneracy) TMECom can be found in polynomial time.
Proof. Given a STSAEFTG , the use of a communication device for the team changes the information structure of the game inducing a observable game . A TMECom can be computed over as follows. Given a communication device , enforces a probability distribution over the set of feedback rules. is chosen in order to maximize the expected utility of the team. In this setting, no incentive constraints are required because teammates share the same utility function and therefore, under the hypothesis that maximizes it, it is in their best interest to follow the recommendations sent by the device and to report truthfully their information. Thus, considering the function to be defined over information sets and , reduces to a distribution over rules of type .
We are left with an optimization problem in which we have to choose s.t. the worstcase utility of the team is maximized. This is equivalent to a 2player maxmin problem over between and a player playing over team’s joint sequences. By construction, the team player has perfect recall and thus the maxmin problem can be formulated as an LP in sequence form, requiring polynomial time.
5 Finding a TMECor
We initially focus on the computational complexity of the problem of searching for a TMECor.
Theorem 3 (TMECor complexity)
Finding a TMECor is FNPhard when there are two teammates, each with an arbitrary number of information sets, or when there is an arbitrary number of teammates, each with one information set.
The first result directly follows from the reduction presented in [von Stengel and Forges2008, Theorem 1.3] since the game instances used in the reduction are exactly STSAEFTGs with 2 teammates. The second result can be proved by adapting the reduction described in [Koller and Megiddo1992, Proposition 2.6], assigning each information set of the game instances to a different teammate.
In principle, a TMECor can be found by casting the game in normal form and then by searching for a Teammaxmin equilibrium with correlated strategies. This latter equilibrium can be found in polynomial time in the size of the normal form, which, however, is given by , where each is exponentially large in the size of the tree. We provide here a more efficient method that can also be used in an anytime fashion, without requiring any exponential enumeration before the execution of the algorithm. In our method, we use a hybrid representation that, to the best of our knowledge, has not been used in previous works.
Hybrid representation. In our representation, ’s strategy is represented in sequence form, while the team plays over jointlyreduced plans, as formally defined below. Given a generic STSAEFTG , let us denote with the set of actions of the reduced normalform of , where is the set of reduced plans for player . Therefore, is the set of joint reduced plans of the team. Let function be s.t. it returns, for a given pair , the terminal node reached when the adversary plays and the team members, at each of their information set, play according to . If no terminal node is reached, is returned. We define some equivalence classes over by the relation :
Definition 9
The equivalence relation over is s.t., given , iff, for each , .
Definition 10 (Jointlyreduced plans)
The set of jointlyreduced plans is obtained by picking exactly one representative from each equivalence class of .
The team’s utility function is represented by the sparse matrix . Given a pair , is stored in iff . Notice that is well defined since each pair leads to at most one terminalnode.
Let denote the team’s strategy over . The problem of finding a TMECor in our hybrid representation can be formulated as the following LP named hybridmaxmin:
composed of constraints (except constraints) and an exponential number of variables . Thus, we can state the following proposition.
Proposition 1
There exists at least one TMECor in which the number of joint plans played with strictly positive probability by the team is at most .
Proof. The above LP admits a basic optimal solution with at most variables with strictly positive values [Shapley and Snow1950]. Since is always in the basis (indeed, we can add a constant to make the team’s utility in each terminal node strictly positive without affecting equilibrium strategies), the joint plans in the basis are .
Proposition 1 shows that the NPhardness of the problem is merely due to guessing the jointlyreduced plans played with strictly positive probability in a TMECor. Thus, we can avoid enumerating entirely before executing the algorithm by working with a subset of jointlyreduced plans built progressively, in a classical columngeneration fashion (see, e.g., [McMahan, Gordon, and Blum2003]).
Columngeneration algorithm. The pseudocode is given in Algorithm 1. It receives in input the game tree and the sequenceform constraint matrices of all the players (Line 1). Then, the algorithm is initialized, assigning a matrix of zeros to , an empty set to , and 0 to (Line 2). Notice that is sparse and therefore its representation requires a space equal to the number of nonnull entries. is initialized as a realization plan equivalent to a uniform behavioral mixed strategy, i.e., the adversary, at each information set, randomizes uniformly over all the available actions (Line 3). Then, the algorithm calls the BRORACLE (defined below) to find the best response of the team given the adversary’s strategy (Line 4). Lines 710 are repeated until an optimal solution is found. Initially, is added to (Line 7) and players’ utilities at nodes reached by for every are added to . Then, the algorithm solves the maxmin (hybridmaxmin) and minmax (hybridminmax) problems restricted to (Lines 8 and 9), where the hybridminmax problem is defined as:
Finally, the algorithm calls BRORACLE to find the best response to (Line 10).
Bestresponse oracle. Given a generic STSAEFTG , we denote the problem of finding the best response of the team against a given a fixed realization plan of the adversary over as BRT. This problem is shown to be NPhard in the reduction used for [von Stengel and Forges2008, Theorem 1.3], where we can interpret the initial chance move as the fixed strategy of the adversary. We can strengthen such a hardness result as follows (the proofs are provided in the Appendices):
Theorem 4
BRT is APXhard.
Let be the best approximation bound of the maximization problem .
Theorem 5
Denote with BRTh the problem BRT over STSAEFTG instances of fixed maximum depth and branching factor variable at each decisionnode, it holds:
This means that the upper bound on the approximation factor decreases exponentially as the depth of the tree increases^{3}^{3}3Notice that , see [Håstad2001].. The columngeneration oracle solving BRT can be formulated as the following integer linear program (ILP):
where
is a binary variable which is equal to
iff, for all the sequences necessary to reach , it holds . Notice that the oracle returns a pure realization plan for each of the teammates. Team’s bestresponse is a jointlyreduced realization plan that can be derived as follows. Denote with the set of sequences played with probability one by that are not subsets of any other sequence played with positive probability. Let be the reduced normalform plan of player specifying all and only actions played in the sequences belonging to . The joint plan is s.t. .A simple approximation algorithm can be obtained by a continuous relaxation of the binary constraints . The resulting mathematical program is linear and therefore solvable in polynomial time. An approximated solution can be obtained by randomized rounding [Raghavan and Tompson1987]. When considering game trees encoding MAXSAT instances (see the proof of Theorems 4), the approximation algorithm matches the ratio guaranteed by randomizedrounding for MAXSAT (details are given in the Appendices).
6 Finding a TME
We recall that finding a TME is hard, since it is hard even with normalform games [Hansen et al.2008].
Theorem 6 (TME complexity)
Finding a TME is FNPhard and its value is inapproximable in additive sense even with binary payoffs.
The problem of finding a TME can be formulated as the following nonlinear mathematical programming problem:
where is the set of team’s joint sequences and identifies the sequence of player in . This program can be solved exactly, within a given numerical accuracy, by means of global optimization tools in exponential time.
7 Experimental Evaluation
Experimental setting. Our experimental setting is based on randomly generated STSAEFTGs. The random game generator takes as inputs: the number of players, a probability distribution over the number of actions available at each information set, the maximum depth of the tree, and a parameter
for tuning the information structure of the tree. Specifically, this parameter encodes the probability with which a newly created decisionnode, once it has been randomly assigned to a player, is assigned to an existing informationset (thus, when it is equal to 0 the game is with perfect information), while guaranteeing perfect recall for every player. Finally, payoffs associated with terminal nodes are randomly drawn from a uniform distribution over
. We generate 20 game instances for each combination of the following parameters’ values: , with step size 1 (i.e., for games with 5 players, ), . For simplicity, we fix the branching factor to 2 (this value allows us to maximize and it is also the worst case for the inefficiency indices).The algorithms are implemented in Python 2.7.6, adopting GUROBI 7.0 for LPs and ILPs, AMPL 20170207 and global optimization solver BARON 17.1.2 [Tawarmalani and Sahinidis2005]. We set a time limit to the algorithms of 60 minutes. All the algorithms are executed on a UNIX computer with 2.33GHz CPU and 128 GB RAM. We discuss the main experimental results with 3 players below, while the results with more players are provided in the Appendices. Since the computation of the TMECor from the reduced normal form is impractical for (see the Appendices), we use only Algorithm 1 employing the exact oracle (this demonstrated very fast on every instance).
Empirical PoUs. We report in Fig. 2 the average empiric inefficiency indices with 3 players for some values of . We observe that, despite the theoretical worstcase value increases in , the empiric increase, if any, is negligible. For instance, the worstcase value of with and is , while the average empiric value is . We also observe that the inefficiency increases in , suggesting that it may be maximized in normalform games.
Compute time. We report in Fig. 8 the average compute times of the algorithms and their box plots with 3 players and (the plot includes instances reaching the time limit as this not affects results presentation). As expected, the TMECom computation scales well, allowing one to solve games with more than 16,000 terminal nodes in the time limit. The performances of Algorithm 1
(TMECor) are remarkable since it solves games with more than 2,000 terminals in the time limit, and presents a narrow boxplot, meaning that the variance in the compute time is small. Notice that, with
, the compute times of TMECom and TMECor are comparable, even if the former is computationally hard while the latter is solvable in polynomialtime. As expected, the TME computation does not scale well and its compute time is extremely variable among different instances.8 Conclusions
In this paper, we focus on extensiveform team games with a single adversary. Our main contributions include the definition of game models employing different correlation devices and their suitable solution concepts. We study the inefficiency a team incurs employing various forms of correlation, providing lower bounds to the worstcase values of the inefficiency indices that are arbitrarily large in the game tree. Furthermore, we study the complexity of finding the equilibria, and we provide exact algorithms. Finally, we experimentally evaluate the scalability of our algorithms and the empirical equilibrium inefficiency in random games. In the future, it would be interesting to study approximate equilibriumfinding algorithms in order to reach an improved scalability in all the three correlation scenarios.
References
 [Aumann1974] Aumann, R. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1(1):67–96.
 [Ausiello, Crescenzi, and Protasi1995] Ausiello, G.; Crescenzi, P.; and Protasi, M. 1995. Approximate solution of np optimization problems. Theoretical Computer Science 150(1):1–55.
 [Basilico et al.2017] Basilico, N.; Celli, A.; Nittis, G. D.; and Gatti, N. 2017. Teammaxmin equilibrium: efficiency bounds and algorithms. In AAAI.
 [Borgs et al.2010] Borgs, C.; Chayes, J. T.; Immorlica, N.; Kalai, A. T.; Mirrokni, V. S.; and Papadimitriou, C. H. 2010. The myth of the folk theorem. Games and Economic Behavior 70(1):34–43.
 [Conitzer and Sandholm2006] Conitzer, V., and Sandholm, T. 2006. Computing the optimal strategy to commit to. In ACM EC, 82–90.
 [Forges1986] Forges, F. 1986. An approach to communication equilibria. Econometrica 1375–1385.
 [Gatti et al.2012] Gatti, N.; Patrini, G.; Rocco, M.; and Sandholm, T. 2012. Combining local search techniques and path following for bimatrix games. In UAI, 286–295.
 [Hansen et al.2008] Hansen, K. A.; Hansen, T. D.; Miltersen, P. B.; and Sørensen, T. B. 2008. Approximability and parameterized complexity of minmax values. In WINE, 684–695.
 [Håstad2001] Håstad, J. 2001. Some optimal inapproximability results. Journal of the ACM (JACM) 48(4):798–859.
 [Kohlberg and Mertens1986] Kohlberg, E., and Mertens, J.F. 1986. On the strategic stability of equilibria. Econometrica: Journal of the Econometric Society 1003–1037.
 [Koller and Megiddo1992] Koller, D., and Megiddo, N. 1992. The complexity of twoperson zerosum games in extensive form. Games and economic behavior 4(4):528–552.
 [Kuhn1953] Kuhn, H. W. 1953. Extensive Games and the Problem of Information. Princeton University Press. 193–216.
 [Lemke and Howson, Jr1964] Lemke, C. E., and Howson, Jr. 1964. Equilibrium Points of Bimatrix Games. Journal of the Society for Industrial and Applied Mathematics 12(2):413–423.

[McMahan, Gordon, and Blum2003]
McMahan, H. B.; Gordon, G. J.; and Blum, A.
2003.
Planning in the presence of cost functions controlled by an
adversary.
In
Proceedings of the 20th International Conference on Machine Learning (ICML03)
, 536–543.  [Nisan et al.2007] Nisan, N.; Roughgarden, T.; Tardos, E.; and Vazirani, V. 2007. Algorithmic game theory, volume 1. Cambridge University Press.
 [Raghavan and Tompson1987] Raghavan, P., and Tompson, C. D. 1987. Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4):365–374.
 [Selten1975] Selten, R. 1975. Reexamination of the perfectness concept for equilibrium points in extensive games. International journal of game theory 4(1):25–55.
 [Shapley and Snow1950] Shapley, L. S., and Snow, R. N. 1950. Basic solutions of discrete games. Annals of Mathematics Studies 24:27–35.
 [Shoham and LeytonBrown2009] Shoham, Y., and LeytonBrown, K. 2009. Multiagent systems: Algorithmic, gametheoretic, and logical foundations.
 [Tambe2011] Tambe, M. 2011. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge University Press.
 [Tawarmalani and Sahinidis2005] Tawarmalani, M., and Sahinidis, N. V. 2005. A polyhedral branchandcut approach to global optimization. Mathematical Programming 103:225–249.
 [von Stengel and Forges2008] von Stengel, B., and Forges, F. 2008. Extensiveform correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research 33(4):1002–1022.
 [von Stengel and Koller1997] von Stengel, B., and Koller, D. 1997. Teammaxmin equilibria. Games and Economic Behavior 21(1):309 – 321.
 [von Stengel1996] von Stengel, B. 1996. Efficient computation of behavior strategies. Games and Economic Behavior 14(2):220 – 246.
 [Williamson and Shmoys2011] Williamson, D. P., and Shmoys, D. B. 2011. The design of approximation algorithms. Cambridge university press.
Appendices
Appendix A Proofs of the Theorems
Theorem 4
BRT is APXhard.
Proof. We prove that MAXSAT is APreducible to BRT (MAXSATBRT). Given a boolean formula in conjunctive normal form, MAXSAT is the problem of determining the maximum number of clauses that can be made true by a truth assignment to variables of .
For any with clauses, we build, with a construction similar to [von Stengel and Forges2008, Theorem 1.3], a STSAEFTG as follows:

, and ;

plays first and has a unique decision node (the root of the tree), s.t. ;

player 1 plays on the second level of the tree and has a singleton information set for each clause in . Each information set has, as its actions, the variables that appear in the clause it identifies;

player 2 plays on the third level of the tree. She has one information set for each literal of . At each of her information sets, player 2 chooses whether the literal has to be positive or negative;

if the literal chosen by player 1 is true in the assignment made by player 2.
Consider to be randomizing uniformly over her actions. With this construction, has a pair of pure strategies for the team members leading to payoff 1 iff is satisfiable. Let denote the extensiveform game obtained by the above construction starting from . Denote with the solution to BRT for . Function maps the bestresponse result back to a feasible assignment for the MAXSAT problem.
Once fixed so that each terminal sequence of is selected with probability , the objective functions of MAXSAT and BRT are equivalent since maximizing the utility of the team implies finding the maximum number of satisfiable instances in . Denote with and the value of the two objective functions of BRT and MAXSAT, respectively. It holds . For this reason, the APcondition holds. Specifically, for any , for any rational , for any feasible solution to BRT over , it holds:
where and are, respectively, the optimal solutions to a given instance of the two problems and . Therefore, since MAXSAT is an APXcomplete problem (see [Ausiello, Crescenzi, and Protasi1995]) and it is APreducible to BRT, BRT is APXhard.
Theorem 5
Denote with BRTh the problem BRT over STSAEFTG instances of fixed maximum depth and branching factor variable at each decisionnode. It holds:
Proof. We recall that denotes the best upperbound for the efficient approximation of maximization problem .
Let be a boolean formula in conjunctive normal form. Fix the maximum depth of the tree to an arbitrary value . Build a STSAEFTG following the construction explained in the proof of Theorem 4. At this point, for each terminal node of s.t. , replicate by substituting with the root of a new . Repeat this procedure on the terminal nodes of the newly added subtrees until the longest path from the root of to one of the new leafs traverses copies of the original tree. Denote the full tree obtained through this process with . The maximum depth of is and it contains the set of replicas of .
Suppose, by contradiction, there exists a polynomialtime approximation algorithm for BRTh guaranteeing a constant approximation factor . Apply this algorithm to find an approximate solution to BRTh over . For at least one of the subtrees in , it has to hold: , where is the approximation ratio obtained by the algorithm for the problem BRTh over . As shown in the proof of Theorem 4, a solution to BRT over a tree obtained with our construction can be mapped back to obtain an approximate solution to MAXSAT. The same reasoning holds for BRTh. Therefore, if , then , where is the approximation ratio obtained approximating the MAXSAT instance by mapping the solution of BRTh over . Therefore, the approximation algorithm guarantees a constant approximation factor for MAXSAT which is strictly greater that its theoretical upper bound, which is a contradiction.
Appendix B On the approximation algorithm
As mentioned in the main paper, a simple approximation algorithm for the BRT problem can be obtained by relaxing the binary constraints and then applying randomized rounding [Raghavan and Tompson1987]. The linear program relaxation of the ILP oracle is:
Let be an optimal solution to the LP relaxation. We select the approximate bestresponse, which has to be a pure realization plan for each player, by selecting actions according to probabilities specified by . Notice that, once an action has been selected, probability values at the next decisionnode of the team have to be rescaled so that they sum to one (therefore, the rounding process starts from the root).
Let us focus on games encoding MAXSAT instances. Specifically, denote with a generic boolean formula in conjunctive normal form and with the STSAEFTG built as specified in the proof of Theorem 4. It is interesting to notice that, for any , the application of our approximation algorithm to guarantees the same approximation ratio of randomized rounding applied to the relaxation of the ILP formulation of MAXSAT.
Denote with and the approximate algorithms based on randomized rounding for BRT and MAXSAT respectively. The following result holds:
Proposition 2
For any , the approximation ratio for MAXSAT over obtained by the solution of BRT over trough is guaranteed to be at least , i.e., the ratio guaranteed by .
Proof. The relaxation of the MAXSAT ILP () is the following linear formulation:
where is set of clauses of , is the set of literals of , and are the sets of literals appearing in clause nonnegated or negated respectively, is the probability of setting literal to true.
Consider a game encoding a generic . If we apply the relaxation of the bestresponse oracle to , and are equivalent. To see that, first let player determine her realization plan . Once has been fixed, player has, at each of her information sets, a fixed expected utility associated with each of her available actions . Let be the set of available actions at one of the information sets of player 1. There are three possible cases:

. In this case player 1 selects, for each , a probability .

. In this case, playing with probability 1 guarantees player 1 to satisfy the corresponding clause.

and . In this case, is selected with probability , is selected with probability and so on. The resulting strategy profile guarantees expected utility one for the corresponding clause.
Therefore, the value reachable in each clause is determined only by the choice of player 2, i.e., the final utility of the team depends only on . Being the objective functions of the two formulations equivalent, the relaxed oracle enforces the same probability distribution over literals’ truth assignments. That is, the optimal values of and are equivalent. Notice that, in these game instances, player 2 plays only on one level and we can sample a solution to MAXSAT according to as if it was . Therefore, randomized rounding of leads to the same approximation guarantee of , i.e., [Williamson and Shmoys2011].
Appendix C On forcing Tobservability
Algorithm 2 provides a possible implementation of the procedure described in the proof of Lemma 1. Denote with the initial game tree and with the corresponding Tobservable game. The function receives as input: the current node ; the current sequence for the team , a dictionary ids specifying, for each player, an available information set id; a dictionary of dictionaries , with the first level indexed over tuples of sequences, and the second level indexed over identifiers of information sets of (i.e., pairs player/informationset id); the id of the adversary (). By initializing to the root of the tree, the algorithm traverses the tree and assigns new information sets ids when possible, partitioning the old information structure of . Once the execution is completed, can be used to create . Let us denote with and the team sequence and information set defined by decision node in . For each decisionnode of s.t. , the corresponding decision node in is assigned an information set with id .
Appendix D Additional Experimental Results
Empiric PoUs. We present the box plots describing the empiric inefficiency indexes on all the experimental instances. Figure 4 describes the empiric , Figure 5 describes the empiric , and Figure 6 describes the empiric . Notice that each plot displays data for a number of actions up to the biggest instances solvable within the time threshold by both the equilibriumfinding algorithms involved in the ratio.
TMECor in reduced normal form. Computing a TMECor through its reduced normal form [Kohlberg and Mertens1986] is impractical even for relatively small game instances. Figure 7 shows that, for 3player games with , the algorithm does not reach termination within the deadline even for instances of depth 6. Moreover, the amount of memory required by the algorithm would make the computation unfeasible even with a higher time thresholds. Instances of 3player STSAEFTGs with depth 6 required, at least, around of memory each, with the most demanding instances requiring more than . Being the growth of the reduced normal form exponential in the size of the tree, this approach is not feasible for bigger game instances.
Compute time. Figure 8 shows the compute times for all the instances of our experimental setting. The behavior displayed by the equilibriumfinding algorithms is essentially the one described in the main paper for all the game configurations.
Comments
There are no comments yet.