The computational study of adversarial interactions is a central problem in Artificial Intelligence, aiming at finding players’ optimal strategies and predicting the most likely outcome of a game. A vast body of literature focuses on the computation of Nash Equilibria (NEs), mainly in two-player zero-sum games[shoham2009multiagent]. This setting is well understood and, recently, some remarkable results have been achieved by, e.g., brown2017safe brown2017safe,brown2017superhuman. While relevant, this model is rather restrictive, as many practical scenarios are not zero-sum and involve more than two players, and it presents some weaknesses when used as a prescriptive tool, in particular in general-sum games. Indeed, when multiple NEs coexist, the model assumes the lack of communication between the players, preventing them from synchronizing their strategies.
In practical situations where some form of communication is possible, solution concepts different from that of NE are required. The main alternative is the Correlated Equilibrium (CE), introduced by aumann1974 [aumann1974]. In a CE, a device
(i.e., a trusted external mediator) draws strategy profiles from a known joint probability distribution and privately communicates them to each player. The probability distribution induces an equilibrium if each player has no incentive to choose a different strategy from the recommended one, assuming the other players would not deviate either. A variation on the CE is theCoarse Correlated Equilibrium (CCE), introduced in [moulin1978], which only prevents deviations happening before knowing the device’s recommendation. In normal-form games, CEs and CCEs enjoy some appealing properties that make them plausible solution concepts in many practical scenarios. Specifically, they arises from simple and natural learning dynamics [hart2000, cesa2006prediction]
, and they can be computed via linear programming on any normal-form game in polynomial time (assuming the number of players is fixed). Moreover, price-of-anarchy analyses show that coarse correlated equilibria characterizing outcomes of no-regret learning dynamics have near-optimal welfare[roughgarden2009intrinsic, hartline2015no]. While a CE can be found in polynomial time in some classes of succinctly representable multi-player games, finding an optimal CE in these games is, in general, NP-hard [papadimitriou2008, jiang2015polynomial]. A similar result also holds for the problem of finding an optimal CCE. barman barman show that for graphical, polymatrix, congestion, and anonymous games the problem is NP-hard.
Sequential games allow for richer forms of interaction among the players than normal-form games, which lead to different forms of correlation whose general understanding is still limited. Most of the works in this area focus on specific classes of games, such as Bayesian games [forges1993, forges2006] and multi-stage games [myerson1986, forges1986]. In these specific settings, the main solution concepts studied in the literature are the Normal-Form Correlated Equilibrium (NFCE), the Agent-Form Correlated Equilibrium (AFCE), and the Communication-Equilibrium. The first two equilibria only allow for a unidirectional communication from the device to the players, while the third equilibrium allows for bidirectional communication. The only known results for general extensive-form games are due to vonStengel2008 vonStengel2008, who propose the notion of Extensive-Form Correlated Equilibrium (EFCE). The complex structure of extensive-form games significantly increases the computational effort required for correlation, as finding an optimal NFCE is -hard even with two players [vonStengel2008]. An optimal EFCE can be found efficiently in two-player games without Chance moves but, in games with three or more players (including Chance), finding an EFCE (or an AFCE) is -hard [vonStengel2008]. The only positive result for multi-player games is a polynomial-time algorithm to find an EFCE [huang2008].
Correlated equilibria in which recommendations are drawn before the game starts are known as ex ante CEs. These equilibria require only unilateral communication from the device to the players. NFCE, AFCE, and EFCE belong to this family and differ in the time at which the recommendations are communicated to players. Specifically, the NFCE requires, for each player, a single interaction with the mediator taking place before the beginning of the game, whereas AFCE and EFCE require a message for each information set reached during the game. As a consequence, AFCE and EFCE are not suited for problems where the agents have limited communication capabilities, a situation which is frequent in practice. This is the case, for instance, of collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public (see also the recent work by farina2018exante [farina2018exante]). Different forms of correlation have been explored when a team of players faces an adversary [basilico2016, basilico2017computing, basilico2017coordinating, celli2018, farina2018exante]. This setting, also known as ex ante coordination, is quite different from ours. Our notion of correlation is more flexible as any player may have different objectives. Therefore, in our correlation setting, individual players have to be incentivized to follow the recommendations of the mediator. In contrast, in the ex ante coordination setting there is no need for incentive constraints since team members share their final rewards.
In this paper, we focus on equilibria requiring a low level of communication. A natural question is whether correlation can be reached efficiently when agents have limited communication capabilities, i.e., when they cannot receive messages during the execution of the game.111This rules out the possibility of employing an EFCE. Motivated by the hardness result for the NFCE, we introduce the notion of Normal-Form Coarse Correlated Equilibrium (NFCCE) as the extension of CCE to sequential games.
We prove that, unlike the NFCE, the problem of finding an optimal NFCCE admits a polynomial-time algorithm for two-player games without Chance moves. In particular, we devise a hybrid formulation (combining the normal and the sequence forms) for the problem of computing an optimal NFCCE featuring a polynomial number of constraints and an exponential number of variables. We then provide a polynomial-time separation oracle which, thanks to the ellipsoid algorithm [khachiyan1980], allows us to show that an optimal NFCCE can be computed in polynomial time. We also show that this approach cannot be extended to more general settings, illustrating that with more than two players, including Chance, the problem becomes NP-hard.
We describe a practical algorithm to compute an optimal NFCCE based on column generation—a variation of the simplex method in which the variables (columns) of the problem are introduced one at a time. We devise different oracles to solve the corresponding pricing problem. In particular, we provide a polynomial-time oracle suitable for the two-player setting, and an oracle based on a Mixed Integer Linear Program (MILP). Then, we show how to adapt the MILP oracle to the case of two-player games with Nature, and to general multi-player games.
We briefly introduce several of the basic concepts we use in the rest of the paper. Further details can be found in [shoham2009multiagent].
An extensive-form game has a finite set of players and a finite set of actions . Exogenous stochasticity is represented through a non-strategic player (the nature or chance player). is the set of non-terminal decision nodes, and is the set of decision nodes belonging to player . The set of terminal nodes (leaves) is denoted by . The function associates each decision node with the player acting at it. The function is the action function, assigning with each decision node a set of available actions. The successor function is denoted by . Let be the utility function of each each . Moreover, let . Finally, for each let be an information partition of such that decision nodes within the same information set are not distinguishable by player . We write . The a function is such that is the fixed probability with which chance selects at . Moreover, denotes the set of actions available at . We remark that, by definition, for any player , information set , and . In this paper, we focus on games with perfect recall, i.e., games where, at each stage, all the players recall all the information acquired at earlier stages.
An extensive-form game can be equivalently represented in normal-form. Let be the set of pure normal-form plans of player . A normal-form plan specifies an action per information set of player . The normal-form of an extensive-form game is characterized by the same set of players , actions , and the set of utility functions . Function denotes the expected payoff obtained by marginalizing with respect to . The reduced normal form is obtained by deleting duplicated strategies from the normal form.
Strategy Representations. A normal-form strategy for is defined as the function . We denote by the normal-form strategy space of player . A correlated (joint) normal-form strategy is defined as . The size of a normal-form strategy is exponential in the size of the extensive-form tree. This shortcoming can be overcome by exploiting the sequence form [vonStengel1996], whose size is linear in the size of the game tree.
The sequence form decomposes strategies into sequences of actions and their realization probabilities. A sequence for player , associated with a node of the game, is the subset of specifying player ’s actions on the path from the root to . We denote the set of sequences of player by . A sequence is said terminal if it leads to a terminal node for at least a set of sequences of the other players. The set of terminal sequences of player is denoted by . Moreover, we denote by the fictitious sequence leading to the root node and, for each action and sequence , we denote by the extended sequence obtained by appending action to .
A sequence-form strategy, said realization plan, is a function associating each sequence with its probability of being played. A well-defined sequence-form strategy is such that for each and, for each and sequence leading to , and . These constraints are linear in the number of sequences and can be compactly written as , where is an matrix and
is a vector of dimension. The utility function of player is represented by a sparse -dimensional matrix defined only for profiles of terminal sequences leading to a leaf node. With a slight abuse of notation, we denote it by .
Correlation in Normal-Form Games
Let . The classical notion of CE [aumann1974] for normal-form games is:
is a correlated equilibrium of the normal form game if, for every and , the following holds:
A CE can be interpreted in terms of a mediator who, ex ante the play, draws according to the publicly known and privately communicates each recommendation to the corresponding player.
Another possibility is enforcing protection against deviations of players which are independent from the sampled outcome. This can be done though the notion of coarse correlated equilibrium [moulin1978].
is a coarse correlated equilibrium of a normal-form game if, for every and , the following holds:
CCEs differ from CEs in that a CCE only requires that following the suggested action is a best response in expectation before the recommended action is actually revealed. Moreover, we recall that every CE is also a CCE while the converse is, in general, not true.
An optimal CCE may lead to a social welfare arbitrarily larger than the social welfare provided by the optimal CE on the same game. Figure 1 reports a normal-form game where this happens ().
The joint strategy profile assigning probability to and is the CCE maximizing the social welfare of the players, which is . The unique optimal CE is the probability distribution assigning probability 1 to , providing a social welfare of 1 independently of . Therefore, for increasing values of , an optimal CCE allows the players to reach a social welfare which is arbitrarily larger than the social welfare reached through the optimal CE.
Correlation in General Extensive-Form Games
We review the main notions of correlation for general extensive-form games. In this general setting, it is customary to consider ex ante CEs, i.e., correlated equilibria in which an action profile is sampled before the game is played. In this paper, we focus on the following solution concepts:
A normal-form correlated equilibrium (normal-form coarse correlated equilibrium) of an extensive-form game is a correlated equilibrium (coarse correlated equilibrium) of the reduced normal-form game equivalent to .
In these two solution concepts, the entire vector of recommendations specifying one action per information set is revealed to the players before the game starts. Thus, once the recommendation is received each player commits to playing a pure strategy.
Informally, an AFCE [forges1993] is a CE of the agent-form game equivalent to the given extensive-form game. In the agent form of the game, moves are chosen by a different agent per information set of the player. In an EFCE [vonStengel2008], each recommendation is assumed to be in a sealed envelope and is revealed only when the player reaches the relevant information set (i.e., the information set where she can make that move). The main difference between EFCE and NFCE/NFCCE is that the former requires recommendations to be delivered during the game execution, thus being more demanding in terms of communication requirements. It is crucial to notice that the size of the signal that has to be sampled is the same, and it has polynomial size (one action per information set).
Letting be the set of equilibria of type of a given game, we have: . See vonStengel2008 vonStengel2008 for further details.
In the next section, we study the problems of computing an NFCE and an NFCCE maximizing the social welfare (i.e., the cumulative utility of the players). We refer to them as NFCE-SW and NFCCE-SW. The generalization of our results to the case in which one searches for an equilibrium maximizing a linear combination of the players’ utility, omitted here for reasons of space, is straightforward.
Complexity of an Optimal NFCCE
We show that there exists a polynomial-time algorithm for solving the NFCCE-SW problem with two players. First, we provide a compact formulation for the problem. Then, we describe a polynomial-time algorithm for solving it.
Given an extensive-form game , a direct application of Definition 3 yields a Linear Programming problem (LP) with an exponential number of variables and an exponential number of constraints. We provide the following result:
The NFCCE-SW problem for an extensive-form game can be formulated as an LP with an exponential number of variables but only a polynomial number of constraints.
To prove the lemma, we provide a hybrid representation which exploits the tree structure of the problem combining both the normal form and the sequence form. Let be a -dimensional column vector representing the pure realization plan for player that is realization equivalent to .222A realization plan is realization equivalent to a normal-form plan if, for any strategy profile of the other players, they enforce the same probability distribution over the terminal nodes of the game tree. We recall that every plan of the reduced normal form is realization equivalent to exactly one pure realization plan, see vonStengel1996 vonStengel1996. In the following and when not differently specified, denotes the sequence-form utility matrix of player .
According to Definition 2, the constraints describing an NFCCE for Player can be written as follows (for Player , the constraints are analogous):
The first term is the expected utility of Player at the equilibrium. Let be the -dimensional vector of variables of the dual of the best-response problem in sequence form. By definition of sequence form, is equal to the first component of , whose value corresponds to the utility of Player at the equilibrium. Then:
The second term of the above inequalities can be written as
can be interpreted as the prior probability with which planis played by Player . can be written as the following realization-equivalent sequence-form strategy: , which is a valid realization plan due to convexity. Now, we only need to show that is not strictly smaller than the value of the best response of Player given the strategy of Player . By exploiting the dual of the best-response problem in sequence form, this is equivalent to showing . Thus, expanding and deriving the equilibrium constraints for Player we obtain the following mathematical program:
This formulation constitutes a proof of Lemma 1 as it employs a polynomial number of constraints (namely, ) and an exponential number of variables.
The following lemma will be employed to prove our central result. It shows that a player can reason in a best-response fashion to minimize the utility of the other player weighted by an arbitrary distribution, while also guaranteeing the reachability of a given terminal node.
Given a generic two-player extensive-form game , an outcome , and a vector , the problem of finding under the constraints that
there exists some s.t. leads to outcome and
can be solved in polynomial time. The same holds when the two players are interchanged.
Proof. Let us focus on the case in which we look for . First, define s.t. for each . Then, let be the extensive-form game obtained from by substituting Player 1’s utility function with . Given , denote by the pair of sequences identifying , and by the set of information sets of player encountered in sequence . Algorithm 1 returns the set of actions () forming a plan of the normal-form game (not reduced) equivalent to .
To retrieve , Algorithm 1 performs a depth-first traversal of the tree while keeping track of the value to be minimized at each decision node () and selecting actions while moving backwards. Then, can be computed by traversing the tree from the root, and selecting actions according to those specified in .
Let us focus on the dual of LP (1)–(6):
admits a polynomial-time separation oracle.
Proof. Let , for all , be the dual variables of constraints (2), the dual variables of constraints (3), the dual variables of constraints (4), and the dual variable of constraint (5). With , is an LP with a number of variables () polynomial in the size of the tree and an exponential () number of constraints. We show that, given a vector
, the problem of either finding a hyperplane separatingfrom the set of feasible solutions to or proving that no such hyperplane exists can be solved in polynomial time. Since the number of dual constraints corresponding to the primal variables is linear, these constraints can be checked efficiently for violation. We are left with the problem of determining whether any of the following constraints, defined for all , is violated:
Let us consider the separation problem of finding an inequality of which is maximally violated at . The problem reads:
A pair yielding a violated inequality exists iff the separation problem admits an optimal solution of value .
One such pair (if any) can be found in polynomial time by enumerating over the (polynomially many) possible outcomes of the game. For each of them, we look for the pair minimizing the objective function of the separation problem, halting as soon as a pair yielding a violated constraint is found. If the procedure terminates without finding any suitable pair, we deduce that no violated inequalities exist and has been solved. First, notice that is constant for the family of pairs identifying . Therefore, we can consider an individual subproblem for each player (i.e., we can find and independently). Hence, for each outcome and for each player the corresponding can be found in polynomial time due to Lemma 2.
The following theorem shows that, in certain cases, the NFCCE-SW problem can be solved efficiently:
Given an extensive-form game with players and without chance moves, an NFCCE maximizing the social welfare can be computed in time polynomial in the size of the game tree.
Proof. Lemma 3 shows that there exists a polynomial-time separation oracle for . Then, can be solved in polynomial time via the ellipsoid method due to the equivalence between optimization and separation [khachiyan1980, Grotschel1981]. As the method solves, in polynomial time, a primal-dual system encompassing not just but also its primal problem NFCEE-SW, it also produces, simultaneously, an optimal solution to the latter.
The approach that we presented here cannot be extended to games with two players and the chance player as, upon introducing the latter, the problem transitions from polynomially solvabile to NP-hard:333Other problems in which this transition takes place are, for example, the problem of computing a socially optimal EFCE [vonStengel2008] and the problem of deciding if a two-player zero-sum extensive-form game with perfect recall admits a pure strategy equilibrium [blair1996perfect, hansen2007finding].
Computing an NFCCE maximizing the social welfare is NP-hard even in extensive-form games with two players, chance moves, and binary outcomes.
Proof Sketch. A construction introduced by vonStengel2008 [vonStengel2008] can be employed. The reduction is from SAT, whose generic instance is a Boolean formula in conjunctive normal form with clauses and variables. Given , we build an auxiliary game , of size proportional to that of the boolean formula, following [vonStengel2008, Theorem 1.3]. admits a pure strategy guaranteeing a social welfare of 2 if and only if is satisfiable. Otherwise, the maximum expected social welfare cannot be more than . A pure strategy maximizing the social welfare is also an NFCCE, since no ex ante deviation would result in an increase in the player’s utility, being it already maximal. Then, finding a solution to NFCCE-SW in polynomial time would imply the existence of a polynomial time algorithm for SAT, which leads to a contradiction, unless P=NP.
Notice that, when considering the separation problem of , working with chance is hard because the first term of the objective function of the separation problem is no longer constant when the outcome is fixed. In the case with and no chance moves, one would have to determine the joint best response of two player a time (to maximize the terms of the objective function of the separation problem following the first one), which is NP-hard [vonStengel2008].
A Practical Algorithm
Due to being based on the ellipsoid method (which, while being a powerful theoretical tool, is well-known to be inefficient in practice), the algorithm that we used in the proof of Theorem 4 is not appealing from a practical perspective. We propose, here, a computationally more efficient method based on the simplex method to compute optimal NFCCEs via a column generation technique. The focus on two-player games is motivated by the negative result in the previous section.
Let be a vector containing the variables of LP (1)–(6):
where, for each , is defined as in the proof of Lemma 1 and is a -dimensional column vector of slack variables. The cost vector associated with the variables is:
where is the utility matrix of the reduced normal-form game. We compactly rewrite the constraints of LP (1)–(6) in standard form as , where is a vector of dimension . We denote the -th column of by .
The algorithm works in two phases, determining, first, a basic feasible solution and, then, iteratively improving it until an optimal one is found. The crucial component of the algorithm is an oracle for solving, given a basic feasible solution to LP (1)–(6), the problem (we refer to it as LRC) of finding a variable with the largest reduced cost. Notice that Theorem 4 already implies the tractability of the problem of finding the variable with the maximum reduced cost—the so-called (primal) pricing problem, as it is equivalent to finding a maximally violated constraint in the dual . Hence:
LRC can be solved in polynomial-time.
Letting be the cost associated with the -th component of and letting be the vector of costs of the basic variables, the -th reduced cost is:
where for each index corresponding to a basic variable. We rely on the following polynomial-time oracle, P-LRC, described in Algorithm 2 (another oracle is presented in the next section).
First, notice that, given a basic feasible solution, is equal to a vector (call it ) of dimension , computable in polynomial time (Line 4). By employing the same notation as the one adopted for the dual variables in the proof of Lemma 3, , where is the vector of dual variables of constraints (3) and (4), are the dual variables of constraints (2), and is that of constraint (5).
The reduced costs of the variables and can be computed directly by definition since their number is polynomial in the size of the tree (Lines 5 to 7). We are left with the problem of evaluating the reduced costs of the variables. P-LRC enumerates the outcomes of the game (Line 8). Since all the pairs of plans identifying have the same , the problem of minimizing amounts to finding a pair minimizing . The problem can be split into a subproblem per player, and solved through Algorithm 1, which we presented in the proof of Lemma 2 (Line 9, where we simplified the signature of C-PLAN-SEARCH for ease of notation). By applying this procedure for each of the outcomes and selecting, among the resulting pairs, the one with the largest reduced cost (Line 13), we are able to determine the new variable entering the basis in polynomial time.
The two phases of the overall algorithm are the following ones, and both adopt P-LRC:
Phase 1: finding a feasible point. A basic feasible solution to NFCCE-SW is determined through an auxiliary problem with artificial variables, where a new variable is introduced for each equality constraint, and their sum is minimized in the objective function. If some artificial variable with index is found in the optimal basis of the auxiliary problem, we can find, in polynomial-time, a variable of the original problem to replace it by either maximizing or minimizing , where is a vector of zeros with suitable dimension and equal to 1 in position (the problem can be solved with Algorithm 1).
Phase 2: finding an optimal solution. Starting from a basic feasible solution, the algorithm iteratively improves it until an optimal solution is found. While, if we were to solve the problem with a standard implementation of the simplex method, we would have to compute the reduced cost of all the nonbasic variables to find one to enter the basis (which would require exponential time in the size of the game), by employing P-LRC the next variable to enter the basis can be found in polynomial time. This follows from the same reasoning that led to Corollary 5.1.
We remark that, while the two phases require polynomial time, the bottleneck of the approach is that, at each iteration, P-LRC has to traverse the game tree twice for each . To circumvent this issue, we present a second oracle based on mixed-integer linear programming (see the experimental evaluation for a comparison between the two approaches).
General Mixed-Integer Oracle
In this section, we describe an oracle (MI-LRC) for computing a solution to LRC by solving a Mixed-Integer Linear Program (MILP). Differently from P-LRC, MI-LRC does not need the explicit enumeration of the terminal nodes of the game, and, furthermore, it can be extended to games with chance and more that two players. We provide, here, a description of the oracle for the case of a two-player game with and without chance moves.444MI-LRC can be extended to games with , we omit the description of this setting due to space constraints.
The crucial difference between MI-LRC and P-LRC is in the way they handle the inspection of the reduced costs associated with the variables. In MI-LRC, lines 8–12 of Algorithm 2 are substituted with an MILP.
Let us first focus on the case of a two-player game without chance moves. Let be a matrix such that if is on the path from the root to , and otherwise. Let also be an
-dimensional vector of binary variables.MI-LRC solves the following problem:
The objective function (7) follows from the definition of the reduced costs (we are looking for a variable whose dual constraint is maximally violated). Constraints (9) force the realization plans to select with probability 1 the sequences on the path to the selected outcome . Notice that, while the objective function contains quadratic terms, they only involve binary variables. Therefore, it can be restated as a linear function after introducing a new variable and four linear constraints per bilinear term according to the formulation proposed in [mccormick1976computability].
Notice that an optimal realization plan , solution to MI-LRC, may not be pure (i.e., there may exist some s.t. ). Nevertheless, there always exists a pair of pure realization plans leading to the same terminal node and granting the same value . Once a pair of pure realization plans has been determined, the reduced cost associated with it has to be computed according to equation (6) and compared to the reduced costs of the remaining variables (Line 13 of Algorithm 2).555It is enough to traverse the tree depth-first, and select sequences, among those played with strictly positive probability in , following the same reasoning of Algorithm 1.
Two-player games with Nature
We denote by the unique tuple of the sequences leading to , where is a sequence of the chance player. The crucial point is that, given , there may exist some , reachable through , satisfying . MI-LRC can be adapted to this scenario as follows. First, for each we compute the utility matrices (with dimension ) obtained by marginalizing each with respect to . Formally, denoting by the realization plan defined over the sequences of the chance player which are realization-equivalent to , for each we have . Objective function (7) is then modified by substituting each with . Moreover, upon denoting by the matrix defined analogously to , it suffices to substitute each of constraints (10), one per , with the constraint , where denotes row of . This way, MI-LRC can be extended to the more demanding setting of games with two-players and chance moves.
We focus on a game with , without chance moves. The oracle can be easily adapted to the setting with , and to include the Chance player. Denote by the utility matrix of player marginalized with respect to realization plan of player — notice that the , with , and . The objective function that needs to be maximized is:
The first term of the objective function only depends on the choice of a single terminal node . The following terms can be addressed following the same reasoning we employed to adapt MI-LRC to the case of a two-player game with Chance. For example, in player 2 and 3 are jointly best-responding against a fixed distribution of player 1 (). Then, MI-LRC can be substituted with the following oracle:
The oracle employs -dimensional vectors of binary variables. Vector selects a single terminal node (constraint 15), determining the value of the first term of the objective function. Each , instead, selects the terminal nodes reachable through a certain choice of plans of the players that are best-responding against (constraint 17). As an example, iff is reachable through the chosen . Realization plans are constrained to be consistent with the selected outcomes (constraint 19). Finally, the choices in and in each have to be mutually consistent (constraint 18).
In this paper, we have studied ex ante
correlated equilibria in extensive-form games with low communication requirements. First, we showed that an optimal NFCCE can be computed in polynomial time in two-player games. Moreover, we have devised a column generation method which allows for computing solutions iteratively, by employing one of the two oracles which we have devised for the problem of finding a column with the largest reduced cost. In the future, it would be interesting experimentally evaluate our techniques, and to eventually further improve the scalability of our methods to tackle practical problems. Among the possible techniques to achieve this, we mention the adoption of heuristics for solving our oracle, the use of stabilization techniques, and the introduction of dominance relationships among the columns.