Computational Results for Extensive-Form Adversarial Team Games

11/18/2017 ∙ by Andrea Celli, et al. ∙ Politecnico di Milano 0

We provide, to the best of our knowledge, the first computational study of extensive-form adversarial team games. These games are sequential, zero-sum games in which a team of players, sharing the same utility function, faces an adversary. We define three different scenarios according to the communication capabilities of the team. In the first, the teammates can communicate and correlate their actions both before and during the play. In the second, they can only communicate before the play. In the third, no communication is possible at all. We define the most suitable solution concepts, and we study the inefficiency caused by partial or null communication, showing that the inefficiency can be arbitrarily large in the size of the game tree. Furthermore, we study the computational complexity of the equilibrium-finding problem in the three scenarios mentioned above, and we provide, for each of the three scenarios, an exact algorithm. Finally, we empirically evaluate the scalability of the algorithms in random games and the inefficiency caused by partial or null communication.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The design of algorithms for strategic settings has been a central problem in Artificial Intelligence for several years, with the aim of developing agents capable of behaving optimally when facing strategic opponents. Many efforts have been made for 2-player games, e.g., finding a

Nash equilibrium [Lemke and Howson, Jr1964, Gatti et al.2012] and, more recently, finding a Stackelberg equilibrium [Conitzer and Sandholm2006]

. The study of this latter problem paved the way to the field of Security Games, which is, nowadays, one of the application fields of non-cooperative game theory with the highest social impact 

[Tambe2011].

Fewer results are known, instead, about games with more than 2 players—except for games with particular structures, e.g., congestion and polymatrix games [Nisan et al.2007]. An interesting class of games widely unexplored in the literature is that one of adversarial team games. Here, a team of players with the same utilities faces an adversary [von Stengel and Koller1997]. These games can model many realistic security scenarios and can provide tools to coordinate teammates acting strategically. Basilico et al. basilico2016 study the inefficiency a team can incur in normal-form games when teammates cannot correlate their strategies w.r.t. when they can. They also provide algorithms to approximate the Team-maxmin equilibrium—introduced by von Stengel and Koller vonStengelKoller1997—that is the optimal solution when correlation is not possible. Furthermore, it is known that finding a Team-maxmin equilibrium is FNP-hard and inapproximable in additive sense [Hansen et al.2008, Borgs et al.2010].

When extensive-form games enter the picture, adversarial team games become much more intriguing, various forms of correlation being possible. Nevertheless, to the best of our knowledge, this game class is still unexplored in the literature. In the present paper, we focus on three main forms of correlation [Forges1986]. In the first, preplay and intraplay communication is possible, corresponding to the case in which a communication device receives inputs from the teammates about the information they observe during the play, and sends them recommendations about the action to play at each information set. In the second, only preplay communication among the teammates is possible, corresponding to the case in which a correlation device communicates a plan of actions to each teammate before playing the game.111With only preplay correlation, three solution concepts are known: Normal-Form, Extensive-Form, and Agent-Form Correlated Equilibrium. The spaces of players’ strategies achievable with the three correlation devices are the same, while the players’ incentive constraints are different (even if it is not known whether the spaces of the outcomes for the three equilibria in adversarial team games are different). The complexity of computing the equilibrium maximizing the team’s utility in adversarial team games with at least 2 teammates is NP-hard for all the three equilibria [von Stengel and Forges2008]. In our paper, we focus on the first one. Finally, in the third, no communication is possible222This setting is frequent in practice. Consider, for example, a security problem where a set of defensive resources from different security agencies are allocated to protect an environment at risk but, due to organizational constraints, they are not able to share information with each other. The resources have the same goal (i.e., to secure the environment) but cannot coordinate strategy execution. The same happens when a set of resources has to operate in stealth mode..

Original contributions. We formally define game models capturing the three aforementioned cases and the most suitable solution concepts (trivially, the Team-maxmin equilibrium in the third setting). Furthermore, we define three inefficiency indices for the equilibria, capturing: the inefficiency caused by using a correlation device in place of a communication device, the inefficiency caused by not using any form of communication in place of a communication device, and the inefficiency caused by not using any form of communication in place of a correlation device. We provide lower bounds to the worst-case values of the inefficiency indices, showing that they can be arbitrarily large.

Furthermore, we study the computational complexity of the problems of finding the three equilibria with the different forms of correlation, and we design, for each equilibrium, an exact algorithm. We show that when a communication device is available, an equilibrium can be computed in polynomial time (even in the number of players) by a 2-stage algorithm. In the first stage, the game is cast into an auxiliary 2-player equivalent game, while, in the second stage, a solution is found by linear programming. When a correlation device is available, the problem can be easily shown to be

FNP-hard. In this case, we prove that there is always an equilibrium with a small (linear) support, and we design an equilibrium-finding algorithm, based on a classical column-generation approach, that does not need to enumerate an exponential space of actions before its execution. Our algorithm exploits an original hybrid representation of the game combining both normal and sequence forms. The column-generation oracle is shown to deal with an APX-hard problem, with an upper approximation bound decreasing exponentially in the depth of the tree. We also provide an approximation algorithm for the oracle that matches certain approximation guarantees on a subset of instances. When no communication is possible, the equilibrium-finding problem can be easily shown to be FNP-hard. In this case, the problem can be formulated as a non-linear programming problem and solved by resorting to global optimization tools.

Finally, we empirically evaluate the scalability of our algorithms in random game instances. We also evaluate the inefficiency for the team of not adopting a communication device, showing that, differently from the theoretical worst-case bounds, the empirical inefficiency is extremely small.

2 Preliminaries

A perfect-information extensive-form game [Shoham and Leyton-Brown2009] is a tuple , where: is a set of players, is a set of actions, is the set of nonterminal decision nodes, is the set of terminal (leaf) nodes, is a function returning the player acting at a given decision node, is the action function—assigning to each choice node a set of available actions—, is the successor function, and is the set of utility functions in which specifies utilities over terminal nodes for player . Let be the inclusion-wise maximal set of decision nodes such that, for all , . Then, an imperfect-information extensive-form game is a tuple , where is an extensive-form game with perfect information and is the set of information sets, in which is a partition of such that, for any , whenever there exists a where and . As usual in game theory, we assume, for each , there is only one s.t. . We focus on games with perfect recall where, for each player and each , decision nodes belonging to share the same sequence of moves of player on their paths from the root.

The study of extensive-form games is commonly conducted under other equivalent representations. The normal form is a tabular representation in which player ’s actions are plans , specifying a move at each information set in , and player ’s utility is s.t. , where is the terminal node reached when playing plan profile . Basically, in the normal-form representation, players decide their behavior in the whole game ex ante the play. The reduced normal form is obtained by deleting replicated strategies from the normal form. However, the size of the reduced normal form is exponential in the number of information sets. A mixed strategy of player

is a probability distribution on her set of pure strategies

. In the agent form—whose definition is omitted due to reasons of space, see [Selten1975]—, players play behavioral strategies, denoted by , each specifying a probability distribution over the actions  available at information set  of player . Two strategies, even of different representations, are realization equivalent if, for any strategy profile of the opponents, they induce the same probability distribution over the outcomes. In a finite perfect-recall game, any mixed strategy can be replaced by an equivalent behavioral one [Kuhn1953].

Both normal and agent forms suffer from computational issues that can be overcome by using the sequence form [von Stengel1996], whose size is linear in the size of the game tree. A sequence for player , defined by a node of the game tree, is the subset of specifying player ’s actions on the path from the root to . We denote the set of sequences of player  by , these are the sequence-form actions of player . A sequence is said terminal if, together with some sequences of the other players, leads to a terminal node. Moreover, we denote by  the fictitious sequence leading to the root node and with the extended sequence obtained by appending to sequence . The sequence-form strategy, said realization plan, is a function associating each sequence with its probability of being played. A well-defined sequence-form strategy is such that, for each , , for each and sequence leading to , and . Constraints are linear in the number of sequences and can be written as , where is an opportune matrix and

is an opportune vector. The utility function of player

is represented as an -dimensional matrix defined only for profiles of terminal sequences leading to a leaf. With a slight abuse of notation, we denote it by .

A Nash equilibrium (NE), whose definition does not depend on the representation of the game, is a strategy profile in which no player can improve her utility by deviating from her strategy once fixed the strategies of all the other players.

3 Extensive-Form Adversarial Team Games, Equilibria, and Inefficiency

We initially provide the formal definition of a team.

Definition 1 (Team)

Given an extensive-form game with imperfect information , a team is an inclusion-wise maximal subset of players such that, for any , for all , .

We denote by the set and by the set of actions available at the information sets in . An extensive-form team game (EF-TG) is a generic extensive-form game where at least one team is present. Von Stengel and Koller vonStengelKoller1997 analyze zero-sum normal-form games where a single team plays against an adversary. We extend this game model to the scenario of extensive-form games.

Definition 2 (Stsa-Ef-Tg)

A zero-sum single-team single-adversary extensive-form team game (STSA-EF-TG) is a game in which:

  • , where set defines a team (as in Definition 1) and player is the adversary ();

  • for each it holds: , where denotes the utility of teammates and that one of the adversary.

When the teammates have no chance to correlate their strategies, the most appropriate solution concept is the Team-maxmin equilibrium (TME). Formally, the TME is defined as . By using the same arguments used by von Stengel and Koller vonStengelKoller1997 for the case of normal-form games, it follows that also in extensive-form games a TME is unique except for degeneracy and it is the NE maximizing team’s expected utility. Nevertheless, in many scenarios, teammates may exploit higher correlation capabilities. While in normal-form games these capabilities reduce to employing a correlation device as proposed by [Aumann1974], in extensive-form games we can distinguish different forms of correlation. More precisely, the strongest correlation is achieved when teammates can communicate both before and during the execution of the game (preplay and intraplay communication), exchanging their private information by exploiting a mediator that recommends actions to them. This setting can be modeled by resorting to a communication device defined in a similar way to [Forges1986]. A weaker correlation is achieved when teammates can communicate only before the play (preplay communication). This setting can be modeled by resorting to a correlation device analogous to that one for normal-form games. We formally define these two devices as follows (as customary, denotes the simplex over ).

Definition 3 (Communication device)

A communication device is a triple where is the set of inputs (i.e., information sets) that teammates can communicate to the mediator, is the set of outputs (i.e., actions) that the mediator can recommend to the teammates, and is the recommendation function that associates each information set with a probability distribution over , as a function of information sets previously reported by teammates and of the actions recommended by the mediator in the past.

Definition 4 (Correlation device)

A correlation device is a pair . is the recommendation function which returns a probability distribution over the reduced joint plans of the teammates.

Notice that, while a communication device provides its recommendations drawing actions from probability distributions during the game, a correlation device does that only before the beginning of the game. Resorting to these definitions, we introduce the following solution concepts.

Definition 5 (Team-maxmin equilibrium variations)

Given a communication device—or a correlation device—for the team, a Team-maxmin equilibrium with communication device (TMECom)—or a Team-maxmin equilibrium with correlation device (TMECor)—is a Nash equilibrium in which all teammates follow their recommendations and, only for TMECom, report truthfully their information.

Notice that in our setting (i.e., zero-sum games), both TMECom and TMECor maximize team’s utility. We state the following, whose proof is straightforward.

Property 1 (Strategy space)

The space of lotteries over the outcomes achievable by using a communication device includes that one of the lotteries achievable by using a correlation device, that, in its turn, includes the space of the lotteries achievable without any device.

Let , , be the utility of the team at, respectively, the TME, the TMECom and the TMECor. From the property above, we can easily derive the following.

Property 2 (Equilibria utility)

The game values obtained with the different solution concepts introduced above are such that .

In order to evaluate the inefficiency due to the impossibility of adopting a communication or correlation device, we resort to the concept of Price of Uncorrelation (), previously introduced in [Basilico et al.2017] as a measure of the inefficiency of the TME w.r.t. the TMECor in normal-form games. In these games, the is defined as the ratio between the utility given by the TMECor and the utility given by the TME, once all the team’s payoffs are normalized in . For extensive-form games, we propose the following variations of the to capture all the possible combinations of different forms of correlation.

Definition 6 (Inefficiency indices)

, , .

In perfect-information games all these indices assume a value of 1, the solution being unique unless degeneracy by backward induction. With imperfect information the indices can be larger than 1. In normal-form games, the tight upper bound to is , where is the number of actions of each player and is the number of players [Basilico et al.2017]. Using a definition based on is not suitable for extensive-form games, where each player may have a different number of actions per node. Thus, we state the bounds in terms of (i.e., the number of terminal nodes). The following three examples provide lower bounds to the worst-case values of the indices, showing that the inefficiency may be arbitrarily large in . Initially, to ease the presentation, we define a specific type of team player that we call spy.

Definition 7 (Spy player)

Player is said to be a spy if, for each , and is a singleton.

A spy just observes the actual state of the game and her contribution to the play is only due to her communication capabilities. Notice that the introduction of a spy after decision nodes of the adversary does not affect the team’s utility in a TMECor (the team’s joint plans are the same) but improves the team’s capabilities, and final utility, in a TMECom.

Figure 1: A game with a spy used in Example 1.
Example 1 (Lower bound for worst-case )

Consider a STSA-EF-TG with players and actions for each player at every decision node except for the first team player, who is a spy. The game tree is structured as follows (see Figure 1 for the case with ).

  • The adversary plays first;

  • then the spy observes her move;

  • each one of the other teammates is assigned one of the following levels of the game tree and all her decision nodes are part of the same information set;

  • iff, for each and for each , the action chosen at is equal to the one selected by .

We have , and thus . Since the tree structure is such that we obtain . Once is fixed, the inefficiency is monotonically increasing in , but is upper bounded by (corresponding to the case in which each team player except the spy has the minimum number of actions, i.e., 2). It follows that, in the worst case w.r.t. , .

Example 2 (Lower bound for worst-case )

Consider a STSA-EF-TG with players and actions at each of their decision nodes, in which each level of the game tree is associated with one player and forms a unique information set. iff all the teammates choose the same action of the adversary, who plays first. This case corresponds to the worst case for in normal-form games. Here we formulate the bound in terms of . We have and . It follows that . This time, and thus . The worst case w.r.t. is reached when and . Therefore, .

Example 3 (Lower bound for worst-case )

Consider the game presented in Example 1. Since and , it follows . The structure of the game tree is such that and thus . Notice that, in this case, the inefficiency is maximized when , which corresponds to having a team of two members. Thus, in the worst case w.r.t. , .

4 Finding a TMECom

We show that there is a polynomial-time TMECom-finding algorithm. Indeed, we prove that the problem of finding a TMECom is equivalent to finding a 2-player maxmin strategy in an auxiliary 2-player game with perfect recall and that the auxiliary game can be built in polynomial time.

First, we define the structure of the auxiliary game we use. Let be an extensive-form game and . We define the following functions. Function returns the sequence profile constituting the path from the root to a given node of the tree. Function is s.t., for each and each set of players ,

Intuitively, returns the unique profile of sequences of players in  leading to when combined with some sequences of the players in .

The following definition describes the information structure of the auxiliary extensive-form game.

Definition 8 (-observable game)

For any game and any set of players , the -observable game is a tuple , where is such that:

  1. for each decision node , there exists one and only one s.t. and where denotes the information set containing in ;

  2. for each player , is the set with the lowest possible cardinality s.t. for each and for each pair of decision nodes , it holds:

In a -observable extensive-form game, players belonging to are fully aware of the moves of other players in and share the same information on the moves taken by players in . We show that we can build in polynomial time.

Lemma 1 (-observable game construction)

The T-observable game of a generic STSA-EF-TG can be computed in polynomial time.

Proof. We provide the sketch of an algorithm (the pseudocode is provided in the Appendices) to build a -observable game (i.e., a -observable game with ) in time and space polynomial in the size of the game tree. The algorithm employs nested hash-tables. The first hash-table associates each joint sequence of the team with another hash-table, which is indexed over information sets and has as value the information set id to be used in . is traversed in a depth-first manner while keeping track of the sequence leading to the current node. For each s.t. , a search/insertion over the first hash-table is performed by hashing . Then, once the sequence-specific hash-table is found, the information set is assigned a new id if it is not already present as a key. is built by associating to each decision node of the team a new information set as specified in the hash-table. The worst-case running time is .

Theorem 2 (TMECom computation)

Given a STSA-EF-TG and a communication device for , the unique (unless degeneracy) TMECom can be found in polynomial time.

Proof. Given a STSA-EF-TG , the use of a communication device for the team changes the information structure of the game inducing a -observable game . A TMECom can be computed over as follows. Given a communication device , enforces a probability distribution over the set of feedback rules. is chosen in order to maximize the expected utility of the team. In this setting, no incentive constraints are required because teammates share the same utility function and therefore, under the hypothesis that maximizes it, it is in their best interest to follow the recommendations sent by the device and to report truthfully their information. Thus, considering the function to be defined over information sets and , reduces to a distribution over rules of type .

We are left with an optimization problem in which we have to choose s.t. the worst-case utility of the team is maximized. This is equivalent to a 2-player maxmin problem over between and a player playing over team’s joint sequences. By construction, the team player has perfect recall and thus the maxmin problem can be formulated as an LP in sequence form, requiring polynomial time.

5 Finding a TMECor

We initially focus on the computational complexity of the problem of searching for a TMECor.

Theorem 3 (TMECor complexity)

Finding a TMECor is FNP-hard when there are two teammates, each with an arbitrary number of information sets, or when there is an arbitrary number of teammates, each with one information set.

The first result directly follows from the reduction presented in [von Stengel and Forges2008, Theorem 1.3] since the game instances used in the reduction are exactly STSA-EF-TGs with 2 teammates. The second result can be proved by adapting the reduction described in [Koller and Megiddo1992, Proposition 2.6], assigning each information set of the game instances to a different teammate.

In principle, a TMECor can be found by casting the game in normal form and then by searching for a Team-maxmin equilibrium with correlated strategies. This latter equilibrium can be found in polynomial time in the size of the normal form, which, however, is given by , where each is exponentially large in the size of the tree. We provide here a more efficient method that can also be used in an anytime fashion, without requiring any exponential enumeration before the execution of the algorithm. In our method, we use a hybrid representation that, to the best of our knowledge, has not been used in previous works.

Hybrid representation. In our representation, ’s strategy is represented in sequence form, while the team plays over jointly-reduced plans, as formally defined below. Given a generic STSA-EF-TG , let us denote with the set of actions of the reduced normal-form of , where is the set of reduced plans for player . Therefore, is the set of joint reduced plans of the team. Let function be s.t. it returns, for a given pair , the terminal node reached when the adversary plays and the team members, at each of their information set, play according to . If no terminal node is reached, is returned. We define some equivalence classes over by the relation :

Definition 9

The equivalence relation over is s.t., given , iff, for each , .

Definition 10 (Jointly-reduced plans)

The set of jointly-reduced plans is obtained by picking exactly one representative from each equivalence class of .

The team’s utility function is represented by the sparse matrix . Given a pair , is stored in iff . Notice that is well defined since each pair leads to at most one terminal-node.

Let denote the team’s strategy over . The problem of finding a TMECor in our hybrid representation can be formulated as the following LP named hybrid-maxmin:

composed of constraints (except constraints) and an exponential number of variables . Thus, we can state the following proposition.

Proposition 1

There exists at least one TMECor in which the number of joint plans played with strictly positive probability by the team is at most .

Proof. The above LP admits a basic optimal solution with at most variables with strictly positive values [Shapley and Snow1950]. Since is always in the basis (indeed, we can add a constant to make the team’s utility in each terminal node strictly positive without affecting equilibrium strategies), the joint plans in the basis are .

Proposition 1 shows that the NP-hardness of the problem is merely due to guessing the jointly-reduced plans played with strictly positive probability in a TMECor. Thus, we can avoid enumerating entirely before executing the algorithm by working with a subset of jointly-reduced plans built progressively, in a classical column-generation fashion (see, e.g., [McMahan, Gordon, and Blum2003]).

Column-generation algorithm. The pseudocode is given in Algorithm 1. It receives in input the game tree and the sequence-form constraint matrices of all the players (Line 1). Then, the algorithm is initialized, assigning a matrix of zeros to , an empty set to , and 0 to (Line 2). Notice that is sparse and therefore its representation requires a space equal to the number of non-null entries. is initialized as a realization plan equivalent to a uniform behavioral mixed strategy, i.e., the adversary, at each information set, randomizes uniformly over all the available actions (Line 3). Then, the algorithm calls the BR-ORACLE (defined below) to find the best response of the team given the adversary’s strategy (Line 4). Lines 7-10 are repeated until an optimal solution is found. Initially, is added to (Line 7) and players’ utilities at nodes reached by for every are added to . Then, the algorithm solves the maxmin (hybrid-maxmin) and minmax (hybrid-minmax) problems restricted to (Lines 8 and 9), where the hybrid-minmax problem is defined as:

Finally, the algorithm calls BR-ORACLE to find the best response to (Line 10).

Best-response oracle. Given a generic STSA-EF-TG , we denote the problem of finding the best response of the team against a given a fixed realization plan of the adversary over as BR-T. This problem is shown to be NP-hard in the reduction used for [von Stengel and Forges2008, Theorem 1.3], where we can interpret the initial chance move as the fixed strategy of the adversary. We can strengthen such a hardness result as follows (the proofs are provided in the Appendices):

Theorem 4

BR-T is APX-hard.

Let be the best approximation bound of the maximization problem .

Theorem 5

Denote with BR-T-h the problem BR-T over STSA-EF-TG instances of fixed maximum depth and branching factor variable at each decision-node, it holds:

This means that the upper bound on the approximation factor decreases exponentially as the depth of the tree increases333Notice that , see [Håstad2001].. The column-generation oracle solving BR-T can be formulated as the following integer linear program (ILP):

where

is a binary variable which is equal to

iff, for all the sequences necessary to reach , it holds . Notice that the oracle returns a pure realization plan for each of the teammates. Team’s best-response is a jointly-reduced realization plan that can be derived as follows. Denote with the set of sequences played with probability one by that are not subsets of any other sequence played with positive probability. Let be the reduced normal-form plan of player specifying all and only actions played in the sequences belonging to . The joint plan is s.t. .

1:function HYBRID-COL-GEN(, ) is a generic STSA-EF-TG and are sequence-form constraint matrices
2:       , , initialization
3:        realization plan equivalent to a uniform behavioral mixed strategy
4:        call to the oracle
5:       while  do
6:             
7:             players’ utilities in for every are added to
8:              solve hybrid-maxmin problem with
9:              solve hybrid-minmax problem with
10:                     
11:       return
Algorithm 1 Hybrid Column Generation

A simple approximation algorithm can be obtained by a continuous relaxation of the binary constraints . The resulting mathematical program is linear and therefore solvable in polynomial time. An approximated solution can be obtained by randomized rounding [Raghavan and Tompson1987]. When considering game trees encoding MAX-SAT instances (see the proof of Theorems 4), the approximation algorithm matches the ratio guaranteed by randomized-rounding for MAX-SAT (details are given in the Appendices).

6 Finding a TME

We recall that finding a TME is hard, since it is hard even with normal-form games [Hansen et al.2008].

Theorem 6 (TME complexity)

Finding a TME is FNP-hard and its value is inapproximable in additive sense even with binary payoffs.

The problem of finding a TME can be formulated as the following non-linear mathematical programming problem:

where is the set of team’s joint sequences and identifies the sequence of player in . This program can be solved exactly, within a given numerical accuracy, by means of global optimization tools in exponential time.

7 Experimental Evaluation

Figure 2: Average empiric inefficiency indices with 3 players and some values of .
Figure 3: Average compute times of the algorithms and their box plots with 3 players and .

Experimental setting. Our experimental setting is based on randomly generated STSA-EF-TGs. The random game generator takes as inputs: the number of players, a probability distribution over the number of actions available at each information set, the maximum depth of the tree, and a parameter

for tuning the information structure of the tree. Specifically, this parameter encodes the probability with which a newly created decision-node, once it has been randomly assigned to a player, is assigned to an existing information-set (thus, when it is equal to 0 the game is with perfect information), while guaranteeing perfect recall for every player. Finally, payoffs associated with terminal nodes are randomly drawn from a uniform distribution over

. We generate 20 game instances for each combination of the following parameters’ values: , with step size 1 (i.e., for games with 5 players, ), . For simplicity, we fix the branching factor to 2 (this value allows us to maximize and it is also the worst case for the inefficiency indices).

The algorithms are implemented in Python 2.7.6, adopting GUROBI 7.0 for LPs and ILPs, AMPL 20170207 and global optimization solver BARON 17.1.2 [Tawarmalani and Sahinidis2005]. We set a time limit to the algorithms of 60 minutes. All the algorithms are executed on a UNIX computer with 2.33GHz CPU and 128 GB RAM. We discuss the main experimental results with 3 players below, while the results with more players are provided in the Appendices. Since the computation of the TMECor from the reduced normal form is impractical for (see the Appendices), we use only Algorithm 1 employing the exact oracle (this demonstrated very fast on every instance).

Empirical PoUs. We report in Fig. 2 the average empiric inefficiency indices with 3 players for some values of . We observe that, despite the theoretical worst-case value increases in , the empiric increase, if any, is negligible. For instance, the worst-case value of with and is , while the average empiric value is . We also observe that the inefficiency increases in , suggesting that it may be maximized in normal-form games.

Compute time. We report in Fig. 8 the average compute times of the algorithms and their box plots with 3 players and (the plot includes instances reaching the time limit as this not affects results presentation). As expected, the TMECom computation scales well, allowing one to solve games with more than 16,000 terminal nodes in the time limit. The performances of Algorithm 1

(TMECor) are remarkable since it solves games with more than 2,000 terminals in the time limit, and presents a narrow boxplot, meaning that the variance in the compute time is small. Notice that, with

, the compute times of TMECom and TMECor are comparable, even if the former is computationally hard while the latter is solvable in polynomial-time. As expected, the TME computation does not scale well and its compute time is extremely variable among different instances.

8 Conclusions

In this paper, we focus on extensive-form team games with a single adversary. Our main contributions include the definition of game models employing different correlation devices and their suitable solution concepts. We study the inefficiency a team incurs employing various forms of correlation, providing lower bounds to the worst-case values of the inefficiency indices that are arbitrarily large in the game tree. Furthermore, we study the complexity of finding the equilibria, and we provide exact algorithms. Finally, we experimentally evaluate the scalability of our algorithms and the empirical equilibrium inefficiency in random games. In the future, it would be interesting to study approximate equilibrium-finding algorithms in order to reach an improved scalability in all the three correlation scenarios.

References

  • [Aumann1974] Aumann, R. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1(1):67–96.
  • [Ausiello, Crescenzi, and Protasi1995] Ausiello, G.; Crescenzi, P.; and Protasi, M. 1995. Approximate solution of np optimization problems. Theoretical Computer Science 150(1):1–55.
  • [Basilico et al.2017] Basilico, N.; Celli, A.; Nittis, G. D.; and Gatti, N. 2017. Team-maxmin equilibrium: efficiency bounds and algorithms. In AAAI.
  • [Borgs et al.2010] Borgs, C.; Chayes, J. T.; Immorlica, N.; Kalai, A. T.; Mirrokni, V. S.; and Papadimitriou, C. H. 2010. The myth of the folk theorem. Games and Economic Behavior 70(1):34–43.
  • [Conitzer and Sandholm2006] Conitzer, V., and Sandholm, T. 2006. Computing the optimal strategy to commit to. In ACM EC, 82–90.
  • [Forges1986] Forges, F. 1986. An approach to communication equilibria. Econometrica 1375–1385.
  • [Gatti et al.2012] Gatti, N.; Patrini, G.; Rocco, M.; and Sandholm, T. 2012. Combining local search techniques and path following for bimatrix games. In UAI, 286–295.
  • [Hansen et al.2008] Hansen, K. A.; Hansen, T. D.; Miltersen, P. B.; and Sørensen, T. B. 2008. Approximability and parameterized complexity of minmax values. In WINE, 684–695.
  • [Håstad2001] Håstad, J. 2001. Some optimal inapproximability results. Journal of the ACM (JACM) 48(4):798–859.
  • [Kohlberg and Mertens1986] Kohlberg, E., and Mertens, J.-F. 1986. On the strategic stability of equilibria. Econometrica: Journal of the Econometric Society 1003–1037.
  • [Koller and Megiddo1992] Koller, D., and Megiddo, N. 1992. The complexity of two-person zero-sum games in extensive form. Games and economic behavior 4(4):528–552.
  • [Kuhn1953] Kuhn, H. W. 1953. Extensive Games and the Problem of Information. Princeton University Press. 193–216.
  • [Lemke and Howson, Jr1964] Lemke, C. E., and Howson, Jr. 1964. Equilibrium Points of Bimatrix Games. Journal of the Society for Industrial and Applied Mathematics 12(2):413–423.
  • [McMahan, Gordon, and Blum2003] McMahan, H. B.; Gordon, G. J.; and Blum, A. 2003. Planning in the presence of cost functions controlled by an adversary. In

    Proceedings of the 20th International Conference on Machine Learning (ICML-03)

    , 536–543.
  • [Nisan et al.2007] Nisan, N.; Roughgarden, T.; Tardos, E.; and Vazirani, V. 2007. Algorithmic game theory, volume 1. Cambridge University Press.
  • [Raghavan and Tompson1987] Raghavan, P., and Tompson, C. D. 1987. Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4):365–374.
  • [Selten1975] Selten, R. 1975. Reexamination of the perfectness concept for equilibrium points in extensive games. International journal of game theory 4(1):25–55.
  • [Shapley and Snow1950] Shapley, L. S., and Snow, R. N. 1950. Basic solutions of discrete games. Annals of Mathematics Studies 24:27–35.
  • [Shoham and Leyton-Brown2009] Shoham, Y., and Leyton-Brown, K. 2009. Multiagent systems: Algorithmic, game-theoretic, and logical foundations.
  • [Tambe2011] Tambe, M. 2011. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge University Press.
  • [Tawarmalani and Sahinidis2005] Tawarmalani, M., and Sahinidis, N. V. 2005. A polyhedral branch-and-cut approach to global optimization. Mathematical Programming 103:225–249.
  • [von Stengel and Forges2008] von Stengel, B., and Forges, F. 2008. Extensive-form correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research 33(4):1002–1022.
  • [von Stengel and Koller1997] von Stengel, B., and Koller, D. 1997. Team-maxmin equilibria. Games and Economic Behavior 21(1):309 – 321.
  • [von Stengel1996] von Stengel, B. 1996. Efficient computation of behavior strategies. Games and Economic Behavior 14(2):220 – 246.
  • [Williamson and Shmoys2011] Williamson, D. P., and Shmoys, D. B. 2011. The design of approximation algorithms. Cambridge university press.

Appendices

Appendix A Proofs of the Theorems

Theorem 4

BR-T is APX-hard.

Proof. We prove that MAX-SAT is AP-reducible to BR-T (MAX-SATBR-T). Given a boolean formula in conjunctive normal form, MAX-SAT is the problem of determining the maximum number of clauses that can be made true by a truth assignment to variables of .

For any with clauses, we build, with a construction similar to [von Stengel and Forges2008, Theorem 1.3], a STSA-EF-TG as follows:

  • , and ;

  • plays first and has a unique decision node (the root of the tree), s.t. ;

  • player 1 plays on the second level of the tree and has a singleton information set for each clause in . Each information set has, as its actions, the variables that appear in the clause it identifies;

  • player 2 plays on the third level of the tree. She has one information set for each literal of . At each of her information sets, player 2 chooses whether the literal has to be positive or negative;

  • if the literal chosen by player 1 is true in the assignment made by player 2.

Consider to be randomizing uniformly over her actions. With this construction, has a pair of pure strategies for the team members leading to payoff 1 iff is satisfiable. Let denote the extensive-form game obtained by the above construction starting from . Denote with the solution to BR-T for . Function maps the best-response result back to a feasible assignment for the MAX-SAT problem.

Once fixed so that each terminal sequence of is selected with probability , the objective functions of MAX-SAT and BR-T are equivalent since maximizing the utility of the team implies finding the maximum number of satisfiable instances in . Denote with and the value of the two objective functions of BR-T and MAX-SAT, respectively. It holds . For this reason, the AP-condition holds. Specifically, for any , for any rational , for any feasible solution to BR-T over , it holds:

where and are, respectively, the optimal solutions to a given instance of the two problems and . Therefore, since MAX-SAT is an APX-complete problem (see [Ausiello, Crescenzi, and Protasi1995]) and it is AP-reducible to BR-T, BR-T is APX-hard.

Theorem 5

Denote with BR-T-h the problem BR-T over STSA-EF-TG instances of fixed maximum depth and branching factor variable at each decision-node. It holds:

Proof. We recall that denotes the best upper-bound for the efficient approximation of maximization problem .

Let be a boolean formula in conjunctive normal form. Fix the maximum depth of the tree to an arbitrary value . Build a STSA-EF-TG following the construction explained in the proof of Theorem 4. At this point, for each terminal node of s.t. , replicate by substituting with the root of a new . Repeat this procedure on the terminal nodes of the newly added subtrees until the longest path from the root of to one of the new leafs traverses copies of the original tree. Denote the full tree obtained through this process with . The maximum depth of is and it contains the set of replicas of .

Suppose, by contradiction, there exists a polynomial-time approximation algorithm for BR-T-h guaranteeing a constant approximation factor . Apply this algorithm to find an approximate solution to BR-T-h over . For at least one of the sub-trees in , it has to hold: , where is the approximation ratio obtained by the algorithm for the problem BR-T-h over . As shown in the proof of Theorem 4, a solution to BR-T over a tree obtained with our construction can be mapped back to obtain an approximate solution to MAX-SAT. The same reasoning holds for BR-T-h. Therefore, if , then , where is the approximation ratio obtained approximating the MAX-SAT instance by mapping the solution of BR-T-h over . Therefore, the approximation algorithm guarantees a constant approximation factor for MAX-SAT which is strictly greater that its theoretical upper bound, which is a contradiction.

Appendix B On the approximation algorithm

As mentioned in the main paper, a simple approximation algorithm for the BR-T problem can be obtained by relaxing the binary constraints and then applying randomized rounding [Raghavan and Tompson1987]. The linear program relaxation of the ILP oracle is:

Let be an optimal solution to the LP relaxation. We select the approximate best-response, which has to be a pure realization plan for each player, by selecting actions according to probabilities specified by . Notice that, once an action has been selected, probability values at the next decision-node of the team have to be rescaled so that they sum to one (therefore, the rounding process starts from the root).

Let us focus on games encoding MAX-SAT instances. Specifically, denote with a generic boolean formula in conjunctive normal form and with the STSA-EF-TG built as specified in the proof of Theorem 4. It is interesting to notice that, for any , the application of our approximation algorithm to guarantees the same approximation ratio of randomized rounding applied to the relaxation of the ILP formulation of MAX-SAT.

Denote with and the approximate algorithms based on randomized rounding for BR-T and MAX-SAT respectively. The following result holds:

Proposition 2

For any , the approximation ratio for MAX-SAT over obtained by the solution of BR-T over trough is guaranteed to be at least , i.e., the ratio guaranteed by .

Proof. The relaxation of the MAX-SAT ILP () is the following linear formulation:

where is set of clauses of , is the set of literals of , and are the sets of literals appearing in clause non-negated or negated respectively, is the probability of setting literal to true.

Consider a game encoding a generic . If we apply the relaxation of the best-response oracle to , and are equivalent. To see that, first let player determine her realization plan . Once has been fixed, player has, at each of her information sets, a fixed expected utility associated with each of her available actions . Let be the set of available actions at one of the information sets of player 1. There are three possible cases:

  1. . In this case player 1 selects, for each , a probability .

  2. . In this case, playing with probability 1 guarantees player 1 to satisfy the corresponding clause.

  3. and . In this case, is selected with probability , is selected with probability and so on. The resulting strategy profile guarantees expected utility one for the corresponding clause.

Therefore, the value reachable in each clause is determined only by the choice of player 2, i.e., the final utility of the team depends only on . Being the objective functions of the two formulations equivalent, the relaxed oracle enforces the same probability distribution over literals’ truth assignments. That is, the optimal values of and are equivalent. Notice that, in these game instances, player 2 plays only on one level and we can sample a solution to MAX-SAT according to as if it was . Therefore, randomized rounding of leads to the same approximation guarantee of , i.e.,  [Williamson and Shmoys2011].

Appendix C On forcing T-observability

Algorithm 2 provides a possible implementation of the procedure described in the proof of Lemma 1. Denote with the initial game tree and with the corresponding T-observable game. The function receives as input: the current node ; the current sequence for the team , a dictionary ids specifying, for each player, an available information set id; a dictionary of dictionaries , with the first level indexed over tuples of sequences, and the second level indexed over identifiers of information sets of (i.e., pairs player/information-set id); the id of the adversary (). By initializing to the root of the tree, the algorithm traverses the tree and assigns new information sets ids when possible, partitioning the old information structure of . Once the execution is completed, can be used to create . Let us denote with and the team sequence and information set defined by decision node in . For each decision-node of s.t. , the corresponding decision node in is assigned an information set with id .

1:function FORCE-T-OBS(, , , , )
2:       if  is not terminal then
3:             
4:             is_team
5:             if is_team then
6:                    
7:                    
8:                    if  in  then
9:                          if  not in  then
10:                                 
11:                                                            
12:                    else
13:                          
14:                                                            
15:             for  in children do
16:                    if is_team then
17:                          append(.action_in)                     
18:                    FORCE-T-OBS(, , , , )
19:                    if is_team then
20:                          pop()                                         
21:       return
Algorithm 2 Forcing T-Observability

Appendix D Additional Experimental Results

Empiric PoUs. We present the box plots describing the empiric inefficiency indexes on all the experimental instances. Figure 4 describes the empiric , Figure 5 describes the empiric , and Figure 6 describes the empiric . Notice that each plot displays data for a number of actions up to the biggest instances solvable within the time threshold by both the equilibrium-finding algorithms involved in the ratio.

Figure 4: Box plots of the inefficiency index.
Figure 5: Box plots of the inefficiency index.
Figure 6: Boxplots of the inefficiency index.

TMECor in reduced normal form. Computing a TMECor through its reduced normal form [Kohlberg and Mertens1986] is impractical even for relatively small game instances. Figure 7 shows that, for 3-player games with , the algorithm does not reach termination within the deadline even for instances of depth 6. Moreover, the amount of memory required by the algorithm would make the computation unfeasible even with a higher time thresholds. Instances of 3-player STSA-EF-TGs with depth 6 required, at least, around of memory each, with the most demanding instances requiring more than . Being the growth of the reduced normal form exponential in the size of the tree, this approach is not feasible for bigger game instances.

Figure 7: Average compute times and box plots of the computation of TMECor through the reduced normal form. The plot is on 3-player instances with .

Compute time. Figure 8 shows the compute times for all the instances of our experimental setting. The behavior displayed by the equilibrium-finding algorithms is essentially the one described in the main paper for all the game configurations.

Figure 8: Average compute times of the algorithms and their box plots with every game configuration.