1 Introduction
Almost all of the work in graphical games has borrowed heavily from analogies to probabilistic graphical models. Yet, overreliance on those analogies and previous standard approaches to exact inference might have led that approach to face the same computational roadblocks that plagued most exactinference techniques.
As an example of work that heavily exploits previous work in probablistic graphical models (PGMs), Kakade et al. (2003)
designed polynomialtime algorithms based on linear programming for computing
correlated equilibria (CE) in standard graphical games with tree graphs. The approach and polynomialtime results extend to graphical games with boundedtreewidth graphs and graphical polymatrix games with tree graphs. Exact inference is tractable in PGMs whose graphs have bounded treewidth, but intractable in general (Cooper, 1990; Shimony, 1994; Istrail, 2000). In 2005, Papadimitriou and Roughgarden showed the intractability of computing the “socialwelfare” optimum CE in arbitrary graphical games (see also Papadimitriou and Roughgarden (2008)). Everything seemed to point toward an eventual resignation that the approach of Kakade et al. (2003), along with any other approach to the problem for that matter, had hit the “boundedtreewidththreshold wall.”Yet, soon after, Papadimitriou (2005) took a radically different approach to the problem, and surprised the community with an efficient algorithm for computing CE not only in graphical games, but also in almost all known compactly representable games. Jiang and LeytonBrown (2015a) built upon Papadimitriou’s idea to provide what most people would consider an improved polynomialtime algorithm, because of the simplification of the CE that their algorithm outputs (see also Jiang and LeytonBrown, 2011, for a summary). ^{1}^{1}1Papadimitriou’s work has an interesting history, which Jiang and LeytonBrown (2015a) nicely summarize. Some questions arose at the time about the technical soundness in the description of some steps in Papadimitriou’s algorithm. Jiang and LeytonBrown (2015a) provided clarifications to those steps.
An immediate question that arises from the algorithmic results just described is, what is so fundamentally different between the problem of exact inference in graphical models and equilibrium computation that made this result possible in the context of graphical games? Of course, CE, probabilistic inference, and their variants are different problems, even within the same framework of graphical models. The question is, how different are they?
It is wellknown that pure strategy Nash equilibrium (PSNE) is inherently a classical/standard discrete constraints satisfaction problem (CSP). It is also wellknown that any CSP can be cast as a mostlikely, or equivalently, a maximum a posteriori (MAP) assignment estimation problem in Markov random fields (MRFs). ^{2}^{2}2Assuming a solution exists, of course; otherwise the resulting MRF is not welldefined. Through this connection, it is clear that there exists a MAP formulation of PSNE. But what about other, more general forms of equilibria?
We present here a formulation of the problem of equilibrium computation as a kind of local conditions for different approximations to belief inference. Similarly, we show how one can view some special games, called graphical potential games (Ortiz, 2015), as defining an equivalent MRF whose “locally optimal” solutions correspond to arbitrary equilibria of the game. Hence, Papadimitriou’s result, and later that of Jiang and LeytonBrown, open up the possibility that at least new classes of problems in probabilistic graphical models could be solved exactly and efficiently. The question is, which classes?
While we provide specific connections between the two fields that yield immediate theoretical and computational implications, we also provide practical alternatives that result from those connections. That is, the foundation of both Papadimitriou’s and Jiang and LeytonBrown’s algorithms is the ellipsoid method, which is one approach that leads to the polynomialtime algorithm for linear programming. This approach, while provably efficient in theory, is often seen as less practical as other alternatives such as socalled interiorpoint methods. This is in contrast to the simple linear programs that are possible for certain classes of graphical games (Kakade et al., 2003). Are there simpler and practically effective variants of Papadimitriou’s or Jiang and LeytonBrown’s algorithms? While the last question is an important open question, we do not address it directly in this paper. Instead, We employ ideas from the literature of learning in games (Fudenberg and Levine, 1999)
, particularly noregret algorithms and fictitious play, to propose two specific instances of gametheoretic inspired, practical, and effective heuristics for belief inference in MRFs. One heuristic takes a local approach, and the other takes a global approach. We evaluate our proposed algorithms within the context of the most popular, standard, and stateofart techniques from the literature in probabilistic graphical models.
This manuscript describes our work, which starts to address some of the questions above, and reports on our progress.
1.1 Overview of the Paper
Section 2 provides preliminary material, introducing basic notation, terminology, and concepts from graphical models and game theory.
Section 3 is the main technical section of the paper. It shows reductions of different problems in belief inference in MRFs as computing equilibria in graphical potential games compactly represented as Gibbs potential games (Ortiz, 2015). The reductions presented here vary in generality from MAP assignment, marginals, and fulljoint estimation to purestrategy Nash equilibria (PSNE), mixedstrategy Nash equilibria (MSNE), and correlated equilibria (CE), respectively. We briefly discuss a connection between Papadimitriou’s algorithm, as well as Jiang and LeytonBrown’s, and the work of Jaakkola and Jordan (1997)
on variational approximations to the problem of probabilistic inference in MRFs via meanfield mixtures. The paper also includes a discussion on the connections to previous work in computer vision on the problem of relaxation labeling, and work on gametheoretic approaches to (Bayesian) statistical estimation. We then present an alternative approach based on a more global view of the problem, in contrast to the more local approach of the formulations mentioned above. More specifically, we formulate the inference problem using a twoplayer potential game, inspired by the work on
tree reweighed (TRW) messagepassing (Wainwright et al., 2005). We propose a special type of sequential, “hybrid” standard and stochastic fictitious play algorithm for belief inference.Section 4 reports on our experimental evaluation. We compare our proposed algorithms to the popular, most commonly used, standard, and easily implementable approximation techniques in use today.
Section 5 discusses future work and suggests new opportunities for other potential research directions, beyond those already discussed in the main technical sections of the paper.
Section 6 concludes the paper with a summary of our contributions.
2 Preliminaries
This section introduces basic notation and concepts in graphical models and game theory used throughout the paper. It also includes brief statements on current stateoftheart mathematical and computational results in the area.
Basic Notation.
Denote by an
dimensional vector and by
the same vector without component . Similarly, for every set , denote by the (sub)vector formed from using only components in , such that, letting denote the complement of , we can denote for every . If are sets, denote by , and .Graph Terminology and Notation.
Let be an undirected graph, with finite set of vertices or nodes and a set of (undirected) edges . For each node , let be the set of neighbors of in , not including , and the set including . A clique of is a set of nodes with the property that they are all mutually connected: for all , ; in addition, is maximal if there is no other node outside that is also connected to each node in , i.e., for all , for some .
Another useful concept in the context of this paper is that of hypergraphs, which are generalizations of regular graphs. A hypergraph graph is defined by a set of nodes and a set of hyperedges . We can think of the hyperedges as cliques in a regular graph. Indeed, the primal graph of the hypergraph is the graph induced by the node set and where there is an edge between two nodes if they both belong to the same hyperedge; in other words, the primal graph is the graph induced by taking each hyperedge and forming cliques of nodes in a regular graph.
2.1 Probabilistic Graphical Models
Probabilistic graphical models are an elegant marriage of probability and graph theory that has had tremendous impact in the theory and practice of modern artificial intelligence, machine learning, and statistics. It has permitted effective modeling of large, structured highdimensional complex systems found in the real world. The language of probabilistic graphical models allows us to capture the structure of complex interactions between individual entities in the system within a single model. The core component of the model is a graph in which each node
corresponds to a random variable and the edges express conditional independence assumptions about those random variables in the probabilistic system.
2.1.1 Markov Random Fields, Gibbs Distributions, and
the HammersleyClifford Theorem
By definition, a joint probability distribution
is a Markov random field (MRF) with respect to (wrt) an undirected graph if for all , for every node , In that case, the neighbors/variables form the Markov blanket of node/variable .Also by definition, a joint distribution
is a Gibbs distribution wrt an undirected graph if it can be expressed as for some functions indexed by a clique , the set of all (maximal) cliques in , and mapping every possible value that the random variables associated with the nodes in can take to a nonnegative number.We say that a joint probability distribution is positive if it has full support (i.e., for all ). ^{3}^{3}3The positivity constraint is only necessary for the “only if” case proof of the theorem.
Theorem 1.
(HammersleyClifford (Hammersley and Clifford, 1971)) Let be a positive joint probability distribution. Then, is an MRF with respect to if and only if is a Gibbs distribution with respect to .
In the context of the theorem, the functions are positive, which allows us to define MRFs in terms of local potential functions over each clique in the graph. Define the function . Let us refer to any function of this form as a Gibbs potential with respect to . A more familiar expression of an MRF is
2.1.2 Some InferenceRelated Problems in MRFs
One problem of interest in an MRF is to compute a most likely assignment ; that is, the most likely outcome with respect to the MRF . Another problem is to compute the individual marginal probabilities for each variable . A related problem is to compute the normalizing constant (also known as the partition function of the MRF).
Another set of problems concern so called “belief updating.” That is, computing information related to the posterior probability distribution having observed the outcome of some of the variables, also known as the evidence. For MRFs, this problem is computationally equivalent to that of computing prior marginal probabilities.
2.1.3 Brief Overview of Computational Results in
Probabilistic Graphical Models
Both the exact and approximate versions of most inferencerelated problems in MRFs are in general intractable (e.g., NPhard), although polynomialtime algorithms do exists for some special cases (see, e.g., Dagum and Luby, 1993, Roth, 1996, Istrail, 2000, Wang et al., 2013, and the references therein). The complexity of exact algorithms is usually characterized by structural properties of the graph, and the typical statement is that running times are polynomial only for graphs with bounded treewidth (see, e.g., Russell and Norvig, 2003 for more information). Several deterministic and randomized approximation approaches exist (see, e.g., Jordan et al., 1999; Jaakkola, 2000; Geman and Geman, 1984). An approximation approach of particular interest in this paper is variational inference (Jordan et al., 1999; Jaakkola, 2000). Roughly speaking, the general idea is to approximate an intractable MRF by a “closest” probability distribution within a “computationally tractable” class : formally, , where is the KullbackLeibler (KL) divergence between probability distributions and wrt . The simplest example in the so called meanfield (MF) approximation, in which consists of all possible product distributions. Even if is an IM, no closedform solution exists for its meanfield approximation, and the most common computational scheme is based on simple axis parallel optimizations, leading to individual local conditions of optimality and potential local minima: that is, the problem is essentially reduced to finding such that for all , we have , where is the (Shannon) entropy of random variable .
2.2 Game Theory
Game theory (von Neumann and Morgenstern, 1947) provides a mathematical model of the stable behavior (or outcome) that may result from the interaction of rational individuals. This paper concentrates on noncooperative settings: individuals maximize their own utility, act independently, and do not have (direct) control over the behavior of others. ^{4}^{4}4Individual rationality here means that each player seeks to maximize their own utility. Also note that, while many parlor “winlose”/zerosum games involve competition, in general, noncooperative competitive: each player just wants to do the best for himself, regardless of how useful or harmful his behavior is to others.
The concept of equilibrium is central to game theory. Roughly, an equilibrium in a noncooperative game is a point of strategic stance, where no individual player can gain by unilaterally deviating from the equilibrium behavior.
2.2.1 Games and their Representation
Let denote a finite set of players in a game. For each player , let denote the set of actions or pure strategies that can play. Let denote the set of joint actions, denote a joint action, and the individual action of player in . Denote by the joint action of all the players except , such that . Let denote the payoff/utility function of player . If the ’s are finite, then is called the payoff matrix of player . Games represented this way are called normal or strategicform games.
There are a variety of compact representations for large games inspired by probabilistic graphical models in AI and machine learning (La Mura, 2000; Kearns et al., 2001; Koller and Milch, 2003; LeytonBrown and Tennenholtz, 2003; Jiang and LeytonBrown, 2008). The results of this paper are presented in the context of the following generalization of graphical games (Kearns et al., 2001), a simple but powerful model inspired by probabilistic graphical models such as MRFs previously defined by Ortiz (2014). ^{5}^{5}5Connections have already been established between the different kinds of compact representations (Jiang and LeytonBrown, 2008), which may facilitate extensions of ideas, frameworks, and results to those alternative models.
Definition 1.
A graphical multihypermatrix game (GMhG) is defined by

a directed graph in which there is a node in for each of the players in the game (i.e., ), and the set of directed edges, or arcs, defines a set of neighbors whose action affect the payoff function of (i.e., is a neighbor of if and only if there is an arc from to ); and

for each player ,

a set of actions ,

a hypergraph where the vertex set is its (inclusive) neighborhood and the hyperedge set is a set of cliques of players , and

a set of localclique payoff (hyper)matrices.

The interpretation of a GMhG is that, for each player , the local and global payoff (hyper)matrices and of are (implicitly) defined as and , respectively.
Graphical potential games.
Graphical potential games are special instances of GMhGs. They play a key role in establishing a stronger connection between probabilistic inference in MRFs and equilibria in games than previously noted. Ortiz (2015) provides a characterization of graphical potential games, and discusses the implication of convergence of certain kinds of “playing” processes in games based on connections to the Gibbs sampler (Geman and Geman, 1984), via the HammersleyClifford Theorem (Hammersley and Clifford, 1971; Besag, 1974). Yu and Berthod (1995) (implicitly) used graphical potential games to establish an equivalence between local maximumaposteriori (MAP) inference in Markov random fields and Nash equilibria of the game, a topic revisited in Section 3.1. ^{6}^{6}6In the interest of brevity, please see Ortiz (2014) for a thorough discussion of GMhGs, including their compact representation size and connections to other classical classes of games in game theory.
2.2.2 Equilibria as Solution Concepts
Equilibria are generally considered the solutions of games. Various notions of equilibria exist. A pure strategy (Nash) equilibrium (PSNE) of a game is a joint action such that for all players , and for all actions , That is, no player can improve its payoff by unilaterally deviating from its prescribed equilibrium , assuming the others stick to their actions . Some games, such as the extensivelystudied Prisoner’s Dilemma, have PSNE; many others, such as “playground” RockPaperScissors, do not. This is problematic because it will not be possible to “solve” some games using PSNE.
A mixedstrategy of player is a probability distribution over such that is the probability that chooses to play action . ^{7}^{7}7Note that the sets of mixed strategies contain pure strategies, as we can always recover playing a pure strategy exclusively. A joint mixedstrategy is a joint probability distribution capturing the players behavior, such that is the probability that joint action is played, or in other words, each player plays action in component of . Because we are assuming that the players play independently, is a product distribution: . Denote by the joint mixed strategies of all the players except . The expected payoff of a player when some joint mixedstrategy is played is ; abusing notation, denote it by . The conditional expected payoff of a player given that he plays action is ; abusing notation again, denote it by .
A mixedstrategy Nash equilibrium (MSNE) is a joint mixedstrategy that is a product distribution formed by the individual players mixed strategies such that, for all players , and any other alternative mixed strategy for his play, Every game in normalform has at least one such equilibrium (Nash, 1951). Thus, every game has an MSNE “solution.”
One relaxation of MSNE considers the case where the amount of gain each player can obtain from unilateral deviation is very small. This concept is particularly useful to study approximation versions of the computational problem. Given , an (approximate) Nash equilibrium (MSNE) is defined as above, except that the expected gain condition becomes
Several refinements and generalizations of MSNE have been proposed. One of the most interesting generalizations is that of a correlated equilibrium (CE) (Aumann, 1974). In contrast to MSNE, a CE can be a full joint distribution, and thus characterize more complex jointaction behavior by players. Formally, a correlated equilibrium (CE) is a joint probability distribution over such that, for all players , , , and ,
where is the (marginal) probability that player will play according to and is the conditional given . An MSNE is CE that is a product distribution. An equivalent expression of the CE condition above is As was the case for MSNE, we can relax the condition of deviation to account for potential gains from small deviation. Given , adding the term “” to the righthandside of the condition above defines an (approximate) CE. ^{8}^{8}8Note that approximate CE is usually defined based on this unconditional version of the CE conditions (Hart and MasColell, 2000).
CE have several conceptual and computational advantages over MSNE. For instance, all players may achieve better expected payoffs in a CE than those achievable in any MSNE; ^{9}^{9}9The distinction between installing a traffic light at an intersection and leaving the intersection without one is a realworld example of this. some “natural” forms of play are guaranteed to converge to the (set of) CE (Foster and Vohra, 1997, 1999; Fudenberg and Levine, 1999; Hart and MasColell, 2000, 2003, 2005); and CE is consistent with a Bayesian framework (Aumann, 1987), something not yet possible, and apparently unlikely for MSNE (Hart and Mansour, 2007).
2.2.3 Brief Overview of Results in Computational Game Theory
There has been an explosion of computational results on different equilibrium concepts on a variety of game representations and settings since the beginning of this century. The following is a brief summary. We refer the reader to a book by Nisan et al. (2007) for a (partial) introduction to this research area.
The problem for twoplayer zerosum games, where the sum of the entries of both matrix is zero, and therefore only one matrix is needed to represent the game, can be solved in polynomial time: It is equivalent to linear programming (von Neumann and Morgenstern, 1947; Szép and Forgoó, 1985; Karlin, 1959). After being open for over 50 years, the problems of the complexity of computing MSNE in games was finally settled recently, following a very rapid sequence of results in the last part of 2005 (Goldberg and Papadimitriou, 2005; Daskalakis et al., 2005; Daskalakis and Papadimitriou, 2005; Daskalakis et al., 2009b; Chen and Deng, 2005b): Computing MSNE is likely to be hard in the worst case, i.e., PPADcomplete (Papadimitriou, 1994), even in games with only two players (Chen and Deng, 2005a, 2006; Chen et al., 2009; Daskalakis et al., 2009a, b). The result of Fabrikant et al. (2004) suggests that computing PSNE in succinctly representable games is also likely to be intractable in the worst case, i.e., PLScomplete (Johnson et al., 1988). A common statement is that computing MSNE, and in some cases even PSNE, with “special properties” is hard in the worst case (Gilboa and Zemel, 1989; Gottlob et al., 2003; Conitzer and Sandholm, 2008). Computing approximate MSNE is also thought to be hard in the worst case (Chen et al., 2006, 2009). We refer the reader to Ortiz and Irfan (2017), and the references therein, for recent results along this line and a brief survey of the stateoftheart for this problem.
Most current results for computing exact and approximate PSNE or MSNE in graphical games essentially mirror those for MRFs and constraint networks: polynomial time for bounded treewidth graph; intractable in general (Kearns et al., 2001; Gottlob et al., 2003; Daskalakis and Papadimitriou, 2006; Ortiz, 2014). This is unsurprising because they were mostly inspired by analogous versions in probabilistic graphical models and constraint networks in AI, and therefore share similar characteristics. Several heuristics exist for dealing with general graphs (Vickrey and Koller, 2002; Ortiz and Kearns, 2003; Daskalakis and Papadimitriou, 2006).
In contrast, there exist polynomialtime algorithms for computing CE, both for normalform games (where the problem reduces to a simple linear feasibility problem) and even most succinctlyrepresentable games known today (Papadimitriou, 2005; Jiang and LeytonBrown, 2015a), including graphical games.
3 Equilibria and Inference
The line of work presented in this section is partly motivated by the following question: Can we leverage advances in computational game theory for problems in the probabilistic graphical models community? Establishing a strong bilateral connection between both problems may help us answer this question.
The literature on computing equilibria in games has skyrocketed since the beginning of this century. As we discover techniques developed early on within the game theory community, and as new results are generated from the extremely active computational game theory community, we may be able to adapt those techniques for solving games to the inference setting. If we can establish a strong bilateral connection between inference problems and the computation of equilibria, we may be able to relate algorithms in both areas and exchange previously unknown results in each.
3.1 PureStrategy Nash Equilibrium and
Approximate MAP Inference
Consider an MRF with respect to graph and Gibbs potential defined by the set of potential functions . For each node , denote by the subset of cliques in that include . Note that the (inclusive) neighborhood of player is given by .
Define an MRFinduced GMhG, and more specifically, a (hyperedgesymmetric) hypergraphical game (Papadimitriou, 2005; Ortiz, 2015), with the same graph , and for each player , hypergraph with hyperedges and localclique payoff hypermatrices for all . A few observations about the game are in order.
Property 1.
The representation size of the MRFinduced game is the same as that of the MRF: not exponential in the largest neighborhood size, but the size of the largest clique in .
Property 2.
The MRFinduced game is a graphical potential game (Ortiz, 2015) with graph and (Gibbs) potential function : i.e., for all , and ,
Remark 1.
Through the connection established by the last property, it is easy to see that sequential bestresponse dynamics is guaranteed to converge to a PSNE of the game in finite time, regardless of the initial play. ^{10}^{10}10 Recall that bestresponse dynamics refers to the a process where at each time step, each player observes the action of others and takes an action that maximizes its payoff given that the others played . In this case, those dynamics would essentially be implementing an axisparallel coordinate maximization over the space of assignments for the MRF, which is guaranteed to converge to a local maxima (or critical points) of the MRF. In fact, we can conclude that a jointaction is a PSNE of the game if and only if is a local maxima or a critical point of the MRF . Thus, the MRFinduced game, like all potential games (Monderer and Shapley, 1996b), always has PSNE. ^{11}^{11}11This result should not be surprising given that other researchers have established a onetoone relationship between the complexity class PLS (Johnson et al., 1988), which characterizes local search problems, of which finding local maxima of the MRF is an instance, and (ordinal) potential games (Fabrikant et al., 2004).
Similarly, for any potential game, one can define a gameinduced MRF using the potential function of the game whose set of local maxima (and critical points) corresponds exactly to the set of PSNE of the potential game. Through this connection we can show that solving the localMAP problem in MRFs is PLScomplete in general (Fabrikant et al., 2004). ^{12}^{12}12A direct proof of this result follows from Papadimitriou et al. (1990)
, and in particular, the result for Hopfield neural networks
(Hopfield, 1982). A Hopfield neural network can be seen as an MRF, and more specifically, and Ising model, when the weights on the edges are symmetric. Similarly, any Hopfield neural network can be seen as a polymatrix game (Miller and Zucker, 1992); when the weights are symmetric the network can be seen as a potential game (in particular, it is an instance of a party affiliation game (Fabrikant et al., 2004)). Indeed, a stable configuration in an arbitrary Hopfield neural network is equivalent to a PSNE of a corresponding polymatrix game. (See Papadimitriou et al., 1990, and Miller and Zucker, 1992, for the relevant references.)One question that comes to mind is whether one can say anything about the properties of the globally optimal assignment in the gameinduced MRF and the payoff it supports for the players. Or whether it can be characterized by stronger notions of equilibria. For example, are strong NE, in which no coalition of players could obtain a Pareto dominated set of payoffs by unilaterally deviating, joint MAP assignments of the MFR? Or more generally, what characteristics can we assign to the MAP assignments of the gameinduced MRF?
In short, we can use algorithms for PSNE as heuristics to compute locally optimal MAP assignments of and vice versa. ^{13}^{13}13Note that algorithms for PSNE can in principle find critical points of . In either case, algorithms such as the maxproduct version of belief propagation (BP) can only provide such localoptimum/criticalpoint convergence guarantees in general.
Remark 2.
Daskalakis et al. (2007) extended results in game theory characterizing the number of PSNE in normalform games (see Stanford, 1995; Rinott and Scarsini, 2000, and the references therein) to graphical games, but now taking into consideration the network structure of the game. Information about the number of PSNE in games can provide additional insight on the structure of MRFs.
For example, one of the results of Daskalakis et al. (2007) states that for graphs respecting certain expansion properties as the number of nodes/players increases, the number of PSNE of the graphical game will have a limiting distribution that is a Poisson with expected value . Also according to Daskalakis et al. (2007), a similar behavior occurs for games with graphs generated according to the ErdösRényi model with sufficiently high averagedegree (i.e., reasonably high connectivity). Thus, either the set of MRFinduced games has significantly low measure relative to the set of all possible randomly generated games (something that seems likely), or the number of local maxima (and critical points) of the MRF will have a similar distribution, and thus that number is expected to be low. The latter would suggest that local algorithms such as the maxproduct algorithm may be less likely to get stuck in local maxima (or critical points) of the MRF.
In addition, there have been several results stating that PSNE are unlikely to exist in many graphs, and that, when they do exist, they are not that many (Daskalakis et al., 2007). ^{14}^{14}14
In particular, the number of PSNE has a Poisson distribution with parameter
. MRFinduced games would in that sense represent a very rich class of nonrandomly generated graphical games for which the results above do not hold.3.2 Mixedstrategy Equilibria and Belief Inference
Going beyond PSNE and MAP estimation, this subsection begins to establish a stronger, and potentially more useful connection between probabilistic inference and more general concepts of equilibria in games.
Let be a subset of the players (i.e., nodes in the graph) and denote by the (marginal) probability distribution of over possible joint actions of players in . Consider the condition for correlated equilibria (CE), which for the MRFinduced game we can express as, for all ,
Commuting the sums and simplifying we get the following equivalent condition:
(1) 
This simplification is important because it highlights that, modulo expected payoff equivalence, we only need distributions over the original cliques, not the induced neighbohoods/Markov blankets, to represent CE in this class of games, in contrast to Kakade et al. (2003); thus, we are able to maintain the size of the representation of the CE to be the same as that of the game.
As an alternative, we can use the fact that the MRFinduced game is a potential game and, via some definitions and algebraic manipulation, get the following sequence of equivalent conditions, which hold for all , and .
Rewriting the last expression, we get the following equivalent condition: for all , and ,
(2) 
The following are some additional remarks on the implications of the last condition. ^{15}^{15}15
In what follows, we refer to concepts from information theory in the discussion, such as (Shannon’s) entropy, cross entropy, and relative entropy (also known as KullbackLeibler divergence). We refer the reader to
Cover and Thomas (2006) for a textbook introduction to those concepts.Remark 3.
First, it is useful to introduce the following notation. For any distribution , let be the cross entropy between probability distributions and , with respect to . ^{16}^{16}16That is, (a lower bound on) the average number of bits required to transmit ”messages/events” generated according to but encoded using a scheme based on . Denote by the marginal distribution of play over the jointactions of all players except player . Denote by the joint distribution defined as for all .
Then, condition 2 implies the following sequence of conditions, which hold for all .
As anonymous reviewer pointed out, the condition is actually that of a coarse CE (CCE) (Hannan, 1957; Moulin and Vial, 1978), which is a superset of CE and allows us to apply several simple methods for computing such equilibrium concept, as discussed later in this section. Hence, any CE of the MRFinduced game is a kind of approximate local optimum (or critical point) of an approximation of the MRF based on a special type of cross entropy minimization.
The following property summarizes this remark.
Property 3.
For any MRF , any correlated equilibria of the game induced by satisfies .
Remark 4.
Let us introduce some additional notation. For any joint distribution of play , let be its entropy. Similarly, for any player , for any marginal/individual distribution of play , let be its (marginal) entropy. For any distribution and , let be the KullbackLeibler divergence between and , with respect to . Denote by the conditional entropy of the individual play of player given the joint play of all the players except , with respect to .
Then, we can express the condition 2 as the following equivalent conditions, which hold for all .
Hence, any CE of a MRFinduced game is a kind of approximate local optimum (or critical point) of a special kind of variational approximation of the MRF. The following property summarizes this remark.
Property 4.
For any MRF , any correlated equilibria of the game induced by satisfies .
Note that the last property implies that the approximation satisfies the local condition .
Before continuing exploring connections to CE, it is instructive to first consider MSNE.
3.2.1 Mixedstrategy Nash Equilibria and
MeanField Approximations
In the special case of MSNE, the joint mixed strategy is a product distribution. Denote by the (marginal) joint action of play over all the players except , and denote by the probability distribution defined such that the probability of is .
In this special case, the equilibrium conditions imply the following conditions, which hold for all : for all such that ,
Denoting by , the last condition implies that
The last condition is equivalent to
which, in turn, we can express as The last expression is also equivalent to
Hence, a NE of the game is almost a locally optimal meanfield approximation, except for the extra entropic term. In summary, for MSNE we have the following tighter condition than for arbitrary CE.
Property 5.
For any MRF , any MSNE of the game induced by satisfies , for all .
Note that the last property implies that the meanfield approximation satisfies the local condition for all .
One possible way to address the issue of the extra entropic term is to consider instead the MRFinduced infinite game, where each player has the (continuous) utility function ^{17}^{17}17In an infinite game the sets of actions or pure strategies are uncountable. Existence of equilibria holds under reasonable conditions (i.e., each set of actions is a nonempty compact convex subset of Euclidean space, and each player utility is continuous and quasiconcave in the player’s action), all of which are satisfied by the MRFinduced infinite game considered here. (See Fudenberg and Tirole, 1991, for more information.)
and wants to maximize over its mixedstrategy given the other player mixedstrategies for all .
Property 6.
The MRFinduced infinite game defined above is an infinite Gibbs potential game with the same graph and the following potential over the set of individual (product) mixed strategies
where is the normalizing constant for . From this we can derive that the individual player mixedstrategies are a “pure strategy” equilibrium of the infinite game if and only if
Or, in other words, if is a PSNE of the infinite game, then is also a local optimum (or critical point) of the meanfield approximation of .
Remark 5.
The local payoff function defined above for the infinite game also has connections to the game theory literature on learning in games (Fudenberg and Levine, 1999). This area studies properties of processes by which players “learn” how to play in (usually repeated) games; especially properties related to the existence of convergence of the learning (or playing) dynamics to equilibria. In particular, the local payoff function is similar to that used by logistic fictitious play, a special version of a “learning” process called smooth fictitious play. The difference is that the last entropy term involving the individual player’s mixed strategy has a regularizationtype factor such that players play strict bestresponse as . In addition, logistic fictitious play is an instance of a learning process that, if followed by a player, achieves so called approximate universal consistency (i.e., roughly, in the limit of infinite play, the average of the payoffs obtained by the player will be close to the best obtained overall during repeated play, regardless of how the other players behave), also known as Hannan consistency (Hannan, 1957), for appropriate values of depending on the desired approximation level.
Indeed, it is not hard to see that in fact the bestresponse mixedstrategy of player to the mixed strategies of their neighbors is
Hence, running sequential bestresponse dynamics in the MRFinduced infinite game is equivalent to finding a variational meanfield approximation via recursive updating of the first derivative conditions. ^{18}^{18}18In particular, the process is called a Cournot adjustment with lockin in the literature on learning in games (Fudenberg and Levine, 1999). The process will then be equivalent to minimizing the function by axisparallel updates. The resulting sequence of distributions/mixedstrategies monotonically decreases the value of and is guaranteed to converge to a local optimum or a critical point of . Hence, the corresponding learning process is guaranteed to converge to a PSNE of the infinite game, which is in turn an approximate MSNE of the original game. But this is not surprising in retrospect, given the last property (Property 6). That property essentially states a broader property of all potential games: they are isomorphic to so called games with identical interests (Monderer and Shapley, 1996b), which are games where every player has exactly the same payoff function.
Remark 6.
The previous discussion suggests that we could use appropriatelymodified versions of algorithms for MSNE, such as NashProp (Ortiz and Kearns, 2003), as heuristics to obtain a meanfield approximation of the true marginals.
Going in the opposite direction, the discussion above also suggests that, by treating any (graphical) potential game as an MRF, for any fixed , logistic fictitious play in any potential game converges to an approximate MSNE of the potential game. Indeed, there has been recent work in this direction, which explores the connection between learning in games and meanfield approximations in machine learning (Rezek et al., 2008). That work proposes new algorithms based on fictitious play for simple meanfield approximation applied to statistical (Bayesian) estimation.
The gameinduced MRF is a temperature Gibbs measure. As we take , we get the limiting temperature Gibbs measure which is a probability distribution over the set of global maxima of the potential function of the game, and probability everywhere (i.e., the support of the limiting distribution is the set of jointactions that maximize the potential function). The support of the temperature Gibbs measure is a subset of the “globally optimal” PSNE of the potential game. But there might be other equilibria corresponding to local optima (or critical points) of the potential function.
Are there other connections between the Nash equilibria of the game and the support of the limiting distribution?
3.2.2 Correlated Equilibria and
Higherorder Variational Approximations
Kakade et al. (2003) designed polynomialtime algorithms based on linear programming for computing CE in standard graphical games with tree graphs. The approach and polynomialtime results extend to graphical games with boundedtreewidth graphs and graphical polymatrix games with tree graphs. Ortiz et al. (2007) (see also Ortiz et al., 2006
) proposed the principle of maximum entropy (MaxEnt) for equilibrium selection of CE in graphical games. They studied several properties of the MaxEnt CE, designed a monotonically increasing algorithm to compute it, and discussed a learningdynamics view of the algorithm.
Kamisetty et al. (2011) employed advances in approximate inference methods to propose approximation algorithms to compute CE. In all of those cases, the general approach is to use ideas from probabilistic graphical models to design algorithms to compute CE. The focus of this paper is the opposite direction: employing ideas from game theory to design algorithms for belief inference in probabilistic graphical models.Property 4 suggests that we can use the CE for the MRFinduced game as a heuristic approximation to higherorder variational approximations. In fact, one would argue that in the context of inference, doing so is more desirable because, in principle, it can lead to better approximations that can capture more aspects of the joint distribution than a simple meanfield approximation would alone. For example, meanfield approximations are likely to be poor if the MRF is multimodal. Motivated by this fact, Jaakkola and Jordan (1997) suggest using mixture of product distributions to improve the simple variational meanfield approximation.
3.2.3 Some Computational Implications
But, consider the algorithms of Papadimitriou (2005) or Jiang and LeytonBrown (2015a) (see also Papadimitriou and Roughgarden, 2008, and Jiang and LeytonBrown, 2011), which we can use to compute a CE of the MRFinduced game in polynomial time. Such CE will be, by construction, also a (polynomiallysized) mixture of product distributions. (In the case of Jiang and LeytonBrown’s algorithm it will be a mixture of a subset of the jointaction space, which is equivalent to a probability mass function over a polynomiallysized subset of the jointaction space; said differently, a mixture of product of indicator functions, each product corresponding to particular outcomes of the jointaction space.) Hence, the algorithms of Papadimitriou and Jiang and LeytonBrown both provide a means to obtain a heuristic estimate of a local optimum (or critical point) of such a mixture in polynomial time. The result would not be exactly the same as that obtained by Jaakkola and Jordan (1997) in general, because of the extra entropic term mentioned in the discussion earlier. Can we find alternative versions of the payoff matrices, and/or alter Papadimitriou’s algorithm, so that the resulting correlated equilibria provides an exact answer to the approximate inference problem that uses mixtures of product distributions? Regardless, at the very least one could use the resulting CE to initialize the technique of Jaakkola and Jordan (1997) without specifying an a priori number of mixtures.
Having said that, both Papadimitriou’s and Jiang and LeytonBrown’s algorithms make a polynomial number of calls to the ellipsoidalgorithm, or more specifically, its “oracle,” to obtain each of the product distributions whose mixture will form the output CE. It is known that the ellipsoid algorithm is slow in practice. Papadimitriou (2005), Papadimitriou and Roughgarden (2008), and Jiang and LeytonBrown (2015a) leave open the design of more practical algorithms based on interiorpoint methods.
Finally, this connection also suggests that we can (in principle) use any learning algorithm that guarantees convergence to the set of CE (as described in the section on preliminaries on game theory where the concept was introduced) as a heuristic for approximate inference. Several socalled “noregret” learning algorithms satisfy those conditions. Indeed, we use two simple variants of such algorithms in our experiments. Viewed that way, such learning algorithms would be similar in spirit to stochastic simulation algorithms with a kind of “adaptivity” reminiscent of the work on adaptive importance sampling (see, e.g., Cheng and Druzdzel, 2000; Ortiz and Kaelbling, 2000; Ortiz, 2002, and the references therein). Establishing a possible stronger connection between learning in games, CE, and probabilistic inference seems like a promising direction for future research. In fact, as previously mentioned (at the end of Remark 5), there has already been some recent work in this direction, but specifically for MSNE and meanfield approximations (Rezek et al., 2008).
Later in this paper, we present the results of an experimental evaluation of the performance of a simple noregret learning algorithm in computational game theory (Fudenberg and Levine, 1999; Blum and Mansour, 2007; Hart and MasColell, 2000) in the context of probabilistic inference. Those are iterative algorithms like many other approximate inference methods such as mean field and other variational approximations, but closer in spirit to sampling/simulationbased methods such as the Gibbs sampler and other similar MCMC methods. Indeed, the running time per iteration of those algorithms is roughly the same as that of samplingbased methods. We delay the details until the Experiments section (Section 4).
3.3 Other Previous and Related Work
Earlier work on the so called “relaxation labeling” problem in AI and computer vision (Rosenfeld et al., 1976; Miller and Zucker, 1991) has established connections to polymatrix games (Janovskaja, 1968) (see also Hummel and Zucker, 1983, although the connection had yet to be recognized at that time). That work also establishes connections to inference in Hopfield networks, dynamical systems, and polymatrix games (Miller and Zucker, 1991; Zucker, 2001). A reduction of MAP to PSNE in what we call here a GMhG was introduced by Yu and Berthod (1995) in the same context (see also Berthod et al., 1996); although they concentrate on pairwise potentials, which reduce to polymatrix games in this context. Because, in addition, the ultimate goal in MAP inference is to obtain a global optimum configuration, Yu and Berthod (1995) proposed a MetropolisHastingsstyle algorithm in an attempt to avoid local minima. Their algorithm is similar to simulated annealing algorithms used for solving satisfiability problems, and other local methods such as WalkSAT (Selman et al., 1996) (see, e.g., Russell and Norvig, 2003 for more information). The algorithm can also be seen as a kind of learningingames scheme (Fudenberg and Levine, 1999) based on bestresponse with random exploration (or “trembling hand” best response). That is, at every round, some bestresponse is taken with some probability, otherwise the previous response is replayed. Zucker (2001) presents a modern account of that work. The connection to potential games, and all its wellknown properties (e.g., convergence of bestresponse dynamics) does not seem to have been recognized within that literature. Also, none of the work makes connections to higherorder (i.e., beyond meanfield) inference approximation techniques or the gametheoretic notion of CE.
3.4 Approximate Fictitious Play in a Twoplayer
Potential Game for Belief Inference in Ising Models
This section presents a gametheoretic fictitiousplay approach to estimation of nodemarginal probabilities in MRFs. The approach this time is more global in terms of how we use the whole jointdistribution for the estimation of individual marginal probabilities. The inspiration for the approach presented here follows from the work of Wainwright et al. (2005). The section concentrates on Ising models, an important, special MRF instance from statistical physics with its own interesting history.
Definition 2.
An Ising model wrt an undirected graph is an MRF wrt such that
where is the set of node biases ’s and edgeweights ’s, which are the parameters defining the joint distribution over .
It is fair to say that interest on more general classes of MRFs originates from the special class of Ising models. It is also fair to say that, because of the relative simplicity and importance of Ising models for problems in statistical physics, as well as to other ML and AI applications areas such as computer vision and NLP, Ising models have become the most common platforms in which to empirically study approximation algorithms for arbitrary MRFs. In short, simplicity of presentation and empirical evaluation guide the focus of Ising models in this section: Generalizations to arbitrary MRFs are straightforward but cumbersome to present. Hence, in this manuscript, we omit the details of such generalizations.
As an outline, the current section begins with an algorithmic instantiation of the iterative approach. The exact instantiation depends on whether we are using CE or MSNE as the solution concept. The section then follows with an informal discussion of the gametheoretic foundations of the general framework behind the approach, and a discussion of immediate implications to computational properties and potential convergence.
Denote by the set of all spanning trees of connected (undirected) graph that are maximal with respect to (i.e., does not contain any spanning forests). If spanning tree , we denote by the set of edges of . To simplify the presentation of the algorithm, let
and
Initialize , and for each , . At each iteration
For each Isingmodel’s randomvariable index , set
as the estimate of the exact Isingmodel’s marginal probability .
The running time of the algorithm is dominated by the computation of the maximum spanning tree (Step 1) which is . All other steps take .
Within the literature on probabilisitic graphical models, Hamze and de Freitas (2004) propose an MCMC approach based on sampling nonoverlapping trees. While our approach has a sampling flavor, its exact connection to MCMC is unclear at best. Also, the spanning trees that our algorithm generates may overlap.
The following discussion connects the algorithm above to an approximate version of fictitious play from the literature on learning in games in game theory. For the most part, we omit discussions to approximate variational inference in this manuscript, except to say that TRW messagepassing (Wainwright et al., 2005) is the inspiration behind our proposed algorithm above.
The game implicit in the heuristic algorithm above is a twoplayer potential game between a “jointassignment” (JA) player and a “spanningtree” (ST) player. The potential function is The payoff functions and of the JA player and the ST player, respectively, are identical and equal the potential function : formally, . Note that the payoff function of the ST player is strategically equivalent to the function
Comments
There are no comments yet.