Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning

10/26/2019 ∙ by Zehao Dou, et al. ∙ 0

Nash equilibrium has long been a desired solution concept in multi-player games, especially for those on continuous strategy spaces, which have attracted a rapidly growing amount of interests due to advances in research applications such as the generative adversarial networks. Despite the fact that several deep learning based approaches are designed to obtain pure strategy Nash equilibrium, it is rather luxurious to assume the existence of such an equilibrium. In this paper, we present a new method to approximate mixed strategy Nash equilibria in multi-player continuous games, which always exist and include the pure ones as a special case. We remedy the pure strategy weakness by adopting the pushforward measure technique to represent a mixed strategy in continuous spaces. That allows us to generalize the Gradient-based Nikaido-Isoda (GNI) function to measure the distance between the players' joint strategy profile and a Nash equilibrium. Applying the gradient descent algorithm, our approach is shown to converge to a stationary Nash equilibrium under the convexity assumption on payoff functions, the same popular setting as in previous studies. In numerical experiments, our method consistently and significantly outperforms recent works on approximating Nash equilibrium for quadratic games, general blotto games, and GAMUT games.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nash equilibrium (Nash, 1950)

is one of the most important solution concepts in game scenario with multiple rational participants. It plays an important role in theoretical analysis of games to guide rational decision-making processes in multi-agent systems. With the recent success of machine learning applications in games, it attracts even more research interests on applying machine learning technique for unsolved game theory problems, for example, computation of Nash equilibrium for multi-player games. In this paper, we focus on games with continuous action spaces, which include the famous application for Generative Adversarial Networks (GANs)

(Goodfellow et al., 2014), as well as many important game types such as the colonel blotto game (Gross and Wagner, 1950), Cournot competition (R, 1996). We develop a solution significantly improves the status-quo.

There have been several successful approaches to compute Nash equilibrium for multi-player (mostly 2-player) continuous game (Raghunathan et al., 2019; Balduzzi et al., 2018). These works seek Nash equilibria corresponding to pure strategies, in which each player takes a specific action to achieve its best payoff given other players’ actions. A major concern for such a solution concept is its possible non-existence. As a result, the convergences to a Nash equilibrium for these approaches were proven under the assumption for the existence of a pure strategy Nash equilibrium, which can hardly be checked in practice, and their applicability is limited to specific types of games. On the contrary, it is known that mixed strategy Nash equilibria always exist under mild conditions. And note that any pure strategy Nash equilibrium is also a mixed strategy Nash equilibrium, which means the latter one is a much more desired solution concept.

However, a key challenge that obstructs the study of computing a mixed strategy Nash equilibrium, especially for a continuous game, lies on how to design an efficient method to represent the mixed strategy. To be precise, a pure strategy can be represented by a single variable choosing from some region. But as a distribution on each player’s action space, a mixed strategy with respect to the player is defined in a (subspace of) real space

. More generally, exact representation for a mixed strategy of a player usually requires many variables in a continuous space. In addition, the corresponding probability distribution may not have a density function in closed-form.

To address this challenge, we introduce a pushforward measure technique. It is a common tool in measure theory to transfer a measure to some specific measure space (Bogachev, 2007)

. Specific to a continuous game, the probability distribution corresponding to a mixed strategy is obtained via a mapping parameterized by neural nets from a multi-dimensional uniform distribution.

With this pushforward representation, we generalize the Gradient-based Nikaido-Isoda (GNI) function, defined in (Raghunathan et al., 2019), to handle mixed strategy Nash equilibria. The original GNI function can be viewed as a measure for the distance between any joint strategy profile and a Nash equilibrium after applying the payoff functions of players. With proper generalization and modification, we develop its mixed strategy version as a proper measure for a Nash equilibrium. We prove that the distance becomes zero if and only if a stationary mixed Nash equilibrium is obtained. Then we apply the gradient descent algorithm to the general GNI function, which converges to a stationary mixed Nash equilibrium under the convexity assumptions on the payoff functions.

Finally, we compare our method with baseline algorithms in numerical experiments. Our approach shows effective convergence property in all the randomly generated quadratic games, general blotto games and GAMUT games, which outperforms other baselines.

2 Background and Problem Description

The discrete action space Nash equilibrium computation has been most widely studied in the literatures. Most well-known being the Lemke–Howson algorithm Lemke and Howson (1964) for solving the bimatrix game. The state-of-art work in theoretical computer science of Tsaknakis and Spirakis provided a solution of approximation in polynomial time Tsaknakis and Spirakis (2007). Surprisingly, an empirical work Fearnley et al. (2015) shows it performs well against practical game solving methods for the bimatrix game.

However, continuous action space game computation is widely used in practice. But few methods are known for the general Nash equilibrium computation. Several recent effort to develop computational method of Nash equilibrium for multi-player (mostly 2-player) continuous game (Raghunathan et al., 2019; Balduzzi et al., 2018) have been restricted to pure strategies.

Game-theoretical approach has had useful applications to machine learning such as the optimization of GAN network training (Daskalakis et al., 2017; Gidel et al., 2018) and adjustment on the gradient descent method (Balduzzi et al., 2018). However they are limited to pure strategy Nash equilibrium.

We are the first work to study the mixed strategy continuous game Nash equilibrium computation. Our work is motivated by the utilization of the Nikaido-Isoda (NI) function for loss function minimization 

(Uryas’ ev and Rubinstein, 1994; Raghunathan et al., 2019). We start to establish a theoretical formulation of the extend mixed strategy continuous action space Nash equilibrium as a result of the minimization on a functional variation-based Nikaido-Isoda function.

2.1 Continuous Game Nash Equilibrium

(1)

Here denotes the number of players, and the strategy of the -th player where is the dimension of his action space. Let , and denotes the joint pure strategy among all players while the joint pure strategy among players except . denotes the utility function (cost) of -th player. A solution to (1) is called a pure strategy Nash equilibrium.

2.2 Nikaido-Isoda (NI) Function

In the paper (Nikaidô et al. (1955)), Nikaido-Isoda (NI) function is introduced as:

(2)

From the Equation (2), we know for , and is the global minimum of NI function which can only be achieved at a Nash equilibrium (NE). Therefore, a common algorithm of computing NE points is minimizing the NI function above. However, it is a huge difficulty to handle the global infimum. On the one hand, global infimum can not be obtained in finite time. On the other hand, the infimum can be unbounded below in some games, for example the two-player bi-linear games, where . All of the facts above show us the shortcomings of NI function, and in order to rectify them, Raghunathan et al. (2019) introduces the following Gradient-based Nikaido-Isoda (GNI) function.

2.3 Gradient-based Nikaido-Isoda (GNI) Function

If we calculate local infimum in the NI function instead of global infimum, the time complexity and unbounded infimum are no longer shortcomings. In precise, given the local radius , local infimum can be approximated by steepest descent direction, and we get the following GNI function:

By minimizing , a stationary Nash point , where for , can be approximated efficiently. Furthermore, if all the utility functions are convex, then the stationary Nash points (SNP) obtained are actually Nash Equilibrium (NE).

3 (MC-GNI) Gradient-based Nikaido-Isoda Function of Mixed Strategy on Continuous Games

In this section, we are going to introduce our novel Gradient-based Nikaido-Isoda function of mixed strategy on continuous games (MC-GNI), which is used to get an approximated solution of the following optimization problem.

(3)

Before we solve this optimization problem, there is another fundamental question, which is how we should represent (or parametrize) a distribution . The simplest way to do so is to parametrize its density function. However, not every distribution has its density function, such as Dirac distribution, and it will be inconvenient for us to do sampling from only a density function. Therefore, we introduce another way, adopting the pushforward measure to represent a distribution.

Given a distribution and a mapping , data drown from can be transported into a new distribution (constituted by ). Technically speaking, is called the pushforward measure of by mapping , denoted by .

Here, for , we prepare each distribution a corresponding pushforward function , and we have:

where stands for the uniform distribution on . Each time we want to sample from distribution , we only need to sample several from distribution and calculate . Then, these form a sample set from distribution . And optimization problem (3) becomes:

(4)

To solve the optimization problem above, we consider the following Gradient-based Nikaido-Isoda function of Mixed strategy on Continuous games (MC-GNI), generalized from the GNI function introduced above, and we call this function the local regret:

(5)

Here, stands for the 1-st order variation of functional on element function and

By minimizing the functional , we can approximately get stationary Nash points (SNP), and even get Nash equilibrium if all the utility functions are convex. We will prove them in the next section.

In practice, we further parametrize these pushforward functions as: , to efficiently calculate derivatives instead of variations. For simplicity, we denote as

. In order to obtain a better expressibility, we use neural networks as the architecture to parametrize these pushforward functions. Then, MC-GNI function

can be transformed to:

Finally, the MC-GNI function can be minimized by implying gradient descent on these function parameters , the convergence of which is proved in the next section.

4 Theoretical Analysis of MC-GNI

4.1 The Sufficient and Necessary Condition of Stationary Nash Point

As a mixed strategy of an -player continuous game, is a stationary Nash point (SNP) if and only if for , the 1-st order variation

(6)

holds at each direction . Here:

is the expectation of the -th player’s utility with the form of -variable functional. Now, we compute the variation above and deduce the sufficient and necessary condition of SNP.

(7)

where:

For SNP, Equation (6) holds at each direction , i.e. . Therefore, we have

Theorem 1.

is a stationary Nash point (SNP) for an -player continuous game if and only if:

holds for all .

From Equation (7), we also know that:

In other words, the steepest direction is:

Then we show the relationship between stationary Nash point and Nash equilibrium.

Theorem 2.

Denote as the set of SNPs and NEs of a particular -player continuous game. Obviously, . If all utility functions are convex, we have:

Proof.

Suppose is an SNP, we will prove it an NE when all functions are convex. According to the convexity and the condition of SNPs, we know that for and any other pushforward function :

(8)

which leads to our conclusion, that is a global Nash equilibrium. ∎

Next, we show the relationship between the zeros of MC-GNI function and SNPs of the -player continuous game.

Lemma 1.

Assume is a twice differentiable function, and its 1-st order gradient is -Lipschitz continuous. Then for , we have:

Proof.

According to the condition of , there holds the following equations.

(9)

With this lemma, we can show that each global minimum of is also an SNP.

Theorem 3.

If each utility function is twice differentiable and its 1-st order gradient is -Lipschitz continuous. Then:

holds when . Here, is a functional norm which means:

Proof.
(10)

Then, according to Lemma 1:

(11)

And the other side of this inequality is similar. ∎

The theorem above tells us that, is always non-negative as long as . And its global minima, or in the other words, its zeros, are surely SNPs, because for :

Finally, we analyze the stability of SNPs. In the following theorem, we show that the 2-nd order variation of functional is a positive semidefinite operator, which confirms the stability of SNPs.

Theorem 4.

The 2-nd order variation is a positive semidefinite operator for and .

Proof.

The 1-st and 2-nd order variation of satisfy:

(12)

where and

is a matrix. Given , then .

(13)

which is positive semidefinite. Therefore:

is also positive semidefinite. ∎

4.2 Convergence Analysis

In this section, we analyze the convergence analysis of gradient descent:

According to the definition of functional , it can be rewritten as the following form:

where .

Theorem 5.

Suppose is -Lipschitz continuous.Through gradient descent, the function sequence converges sublinearly to a stationary Nash point (SNP) if .

Proof.

According to Lemma 1, we have:

(14)

Let , and add them up, we have:

Since , we know that by Theorem 3, we have

(15)

which completes our proof. ∎

5 Experiments

To evaluate the practical performance of our approach, we apply it to three types of games, two-player quadratic games, general blotto games, and GAMUT games, the most popular games for evaluation of Nash equilibrium algorithms. In all the experiments, we set the local radius and we use gradient descent as our optimization method with step size and momentum . The network architecture we use for the pushforward functions

is a 6-layer fully connected neural network with the size of each layer as: 20, 40, 160, 160, 40, 20. The size of its output layer is the dimension of each player’s action space. From forward to backward, the activation function we use is:

, ReLU,

.

We mainly compare our approach with two recent studies, gradient descent for GNI function (Raghunathan et al., 2019) (gradGNI in short), and Symplectic Gradient Adjustment algorithm (Balduzzi et al., 2018) (SGA in short), as they outperformed other existing algorithms applicable to continuous game settings. For all these methods, we either follow the standard hyper-parameters mentioned in the original papers, or the ones resulting in the best convergence.

5.1 Two-player Quadratic Game

The two-player quadratic game is defined by the the players’ payoff functions ():

(16)

where , , and . In our experiments, we choose . For each pair of , we randomly generate 100 instances for the matrix and for . Each item in each matrix

and each vector

follows the uniform distribution on independently.

We show the converging process of all algorithms for one game instance () in Fig. 1(a) as an example. As we can see, our approach effectively converges to a stationary Nash equilibrium point. While the gradGNI approach also converges in this instance, its result has a larger local regret. In other words, it obtains a worse approximation to Nash equilibrium, which coincides with the essential difference between pure strategy and mixed strategy. The MC-GNI approach searches for the equilibrium in the mixed strategy space, which includes the pure strategy space that the gradGNI approaches searches in. On the other hand, the SGA approach diverges in this game instance. We further take the average of the final local regret after 2000 iterations for all the 100 instances, summarized in Tab. 1. All the algorithms show consistency as the dimension of action space increases, and MC-GNI outperforms others regardless of the randomness of game structures.

5.2 General Blotto Game

We next consider the general blotto game, which differs from previous games in the action space of each player for which further constraints apply.

In a blotto game, player and (sometimes known as two colonels) have a budget of resource , respectively. W.l.o.g we set . There are battlefields in total. In each battlefield , when two players allocate resource on it, the payoff of player is:

(17)

where denotes the player other than player . Each player’s payoff across all battlefields is the sum of the payoffs across the individual battlefields. For each player , a feasible pure strategy must also satisfies . Here we adopt the generalized blotto game proposed by (Golman and Page, 2009) with continuous payoff functions. The payoff functions in vanilla blotto game (Gross and Wagner, 1950) is discontinuous, for which our method as well as baselines fails. In our experiments, we set . For each , we randomly generate 100 instance for the budget , following the uniform distribution on independently.

We show the converging process of all algorithms for one game instance () in Fig. 1(b) as an example. All the algorithms converges for this game, and both the gradGNI and SGA approaches converges faster and more smoothly comparing with our MC-GNI. However, similar to the quadratic game, their final results have larger local regret. This coincides with the fact that the mixed strategy is a better solution concept than the pure strategy, especially in blotto games. We further take the average of the final local regret after 2000 iterations for all the 100 instances, summarized in Tab. 1. All the algorithms show consistency as the dimension of action space increases, and MC-GNI outperforms others regardless of the randomness of game structures.

5.3 GAMUT Games

Finally, we apply our method on the game instance generated by the comprehensive GAMUT suite of game generators designated for testing game-theoretic algorithms Nudelman et al. (2004). GAMUT includes a group of random distributions, based on each of which the payoff of each player for each pure strategy profile can be drown independently. In precise, we extend the quadratic game to a multi-player version, where , and 100 game instances with 4 players are generated. For each instance, one of the distributions from the GAMUT set is selected, and each item in each matrix is sampled according to it independently.

We show the converging process of all algorithms for one game instance in Fig. 1(c). Both MC-GNI and SGA converge, but SGA has a much worse final result than our MC-GNI. And this time, gradGNI diverges. Furthermore, we take the average of the final local regert after 2000 iterations for all the 100 instances, shown in Table 1.

From these different games, we know that our MC-GNI converges and performs better than two baselines in all of the three games, which shows the effectiveness and efficiency of our MC-GNI model. As the first algorithm to compute the mixed strategy Nash equilibrium of games with continuous action space, we believe that the technique we introduced here will enable new optimization researches of many exciting interaction domains of algorithmic game theory and deep learning.

(a) , 2-player quadratic
(b) , 2-player blotto
(c) , 4-player gamut
Figure 1: Local Regret of Various Games.
MC-GNI (our model) gradGNI SGA
Quadratic () e-3 ()e-1
Quadratic () e-3 ()e-1
Quadratic () e-3 ()e-1
Blotto () ()e-6 ()e-5 ()e-5
Blotto () ()e-6 ()e-5 ()e-5
Blotto () ()e-6 ()e-6 ()e-6
GAMUT () ()e-3 ()e-1 ()e-1
GAMUT () ()e-3 ()e-1 ()e-1
GAMUT () ()e-2 ()e-1 ()e-1
Table 1: Comparison results.

References

  • D. Balduzzi, S. Racaniere, J. Martens, J. Foerster, K. Tuyls, and T. Graepel (2018) The mechanics of n-player differentiable games. arXiv preprint arXiv:1802.05642. Cited by: §1, §2, §2, §5.
  • V. I. Bogachev (2007) Measure theory. Vol. 1, Springer Science & Business Media. Cited by: §1.
  • C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng (2017) Training gans with optimism. arXiv preprint arXiv:1711.00141. Cited by: §2.
  • J. Fearnley, T. P. Igwe, and R. Savani (2015) An empirical study of finding approximate equilibria in bimatrix games. In International Symposium on Experimental Algorithms, pp. 339–351. Cited by: §2.
  • G. Gidel, H. Berard, G. Vignoud, P. Vincent, and S. Lacoste-Julien (2018) A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551. Cited by: §2.
  • R. Golman and S. E. Page (2009) General blotto: games of allocative strategic mismatch. Public Choice 138 (3-4), pp. 279–299. Cited by: §5.2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
  • O. Gross and R. Wagner (1950) A continuous colonel blotto game. Technical report RAND PROJECT AIR FORCE SANTA MONICA CA. Cited by: §1, §5.2.
  • C. E. Lemke and J. T. Howson (1964) Equilibrium points of bimatrix games. Journal of the Society for industrial and Applied Mathematics 12 (2), pp. 413–423. Cited by: §2.
  • J. F. Nash (1950) Equilibrium points in n-person games. Proceedings of the national academy of sciences 36 (1), pp. 48–49. Cited by: §1.
  • H. Nikaidô, K. Isoda, et al. (1955) Note on non-cooperative convex games. Pacific Journal of Mathematics 5 (Suppl. 1), pp. 807–815. Cited by: §2.2.
  • E. Nudelman, J. Wortman, Y. Shoham, and K. Leyton-Brown (2004) Run the gamut: a comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 880–887. Cited by: §5.3.
  • V. H. R (1996) Intermediate microeconomics: a modern approach. WW Norton. Cited by: §1.
  • A. U. Raghunathan, A. Cherian, and D. K. Jha (2019) Game theoretic optimization via gradient-based nikaido-isoda function. arXiv preprint arXiv:1905.05927. Cited by: §1, §1, §2.2, §2, §2, §5.
  • H. Tsaknakis and P. G. Spirakis (2007) An optimization approach for approximate nash equilibria. In International Workshop on Web and Internet Economics, pp. 42–56. Cited by: §2.
  • S. Uryas’ ev and R. Y. Rubinstein (1994) On relaxation algorithms in computation of noncooperative equilibria. IEEE Transactions on Automatic Control 39 (6), pp. 1263–1267. Cited by: §2.