1 Introduction
As machine learning algorithms are increasingly being deployed in real world settings, understanding how they interact, and the dynamics that can arise from their interactions is becoming crucial. In recent years, there has been a resurgence in research efforts on multiagent learning, and learning in games. Indeed, game theoretic tools are even being used to
robustify and improve the performance of machine learning algorithms (see, e.g., [21]). Despite this activity, there is still a lack of understanding of the dynamics and limiting behaviors of machine learning algorithms in competitive settings and in games in particular.Concurrently, there has been no shortage of papers on the convergence of gradient descent and its avoidance of saddle points (see, e.g., [19, 28, 45, 17]). Due to their versatility and ease of implementation, gradient descent, and gradientdescent based algorithms are extremely popular in a variety of machine learning and algorithmic decisionmaking problems. The advantages of gradientbased methods have lead to them being widely adopted in multiagent and adversarial learning problems [15, 24, 25, 21, 12, 34]. However, a thorough understanding of their convergence and limiting behaviors is still lacking.
Inspired by the recent works that focus on the singleagent case such as [30, 39] and long existing work in dynamical systems theory including convergence of stochastic approximation [3, 4, 7, 8] and urn processes [6, 40], we investigate the convergence of competitive gradientbased learning to Nash equilibria and other limiting behaviors. Specifically, we are interested in settings where there are two or more competing agents, in potentially uncertain environments. Each agent optimizes their individual objective which depends on the decisions of all the other agents and possibly an external environmental signal. This scenario can be modeled most naturally as a game.
It is common in these settings to consider agents which adopt a learning algorithm for determining a strategy (policy) that governs their decisions. There are many different types of learning algorithms that have been proposed in the literature, several of which have their origins in the single agentcase. We consider the case where the agents adopt a gradientbased learning algorithm, one of the more common approaches in a number of different domains. In fact, in support of the latter point, we show that a wide variety of learning algorithms from different fields fit into the gradient learning framework we analyze.
We remark that there is a fundamental difference between the dynamics that are analyzed in much of the singleagent gradientbased learning and optimization literature and the ones we analyze in the competitive multiagent case. As we show in the following sections, the combined dynamics of gradientbased learning schemes in games does not necessarily correspond to a gradient flow. This may seem a subtle point, but it it turns out to be extremely important. Gradient flows are a very narrow class of flows admitting nice convergence guarantees—e.g., almost sure convergence to local minimizers—due to the fact that they preclude flows with the worst geometries [41]. In particular, they do not exhibit nonequilibrium limiting behavior such as periodic orbits. Gradientbased learning in games, on the other hand, does not preclude such behavior. This makes the analysis more challenging.
Given the prominence of gradientbased learning schemes in multiagent reinforcement learning, online optimization, and other machine learning contexts where gametheoretic ideas are being employed, it is important to understand and be able to interpret the limiting behavior of these coupled algorithms. Recent works have noted that limit cycles emerge in gradientbased learning algorithms. For instance, in [12], it is demonstrated that limit cycles abound in gradient descent for training generative adversarial networks (GANs). Other very recent works have explored the existence of cycles in adversarial learning when the problem is considered in a gametheoretic context [35], thereby highlighting the importance of understanding limiting behaviors other than equilibria of competitive learning algorithms. Dynamical systems exhibiting periodic orbits and other limiting behaviors, have long been studied and we borrow tools from dynamical systems theory in order to characterize the limiting behavior of competitive gradientbased learning.
1.1 Contributions
The high level contributions of this paper are twofold. We first provide guarantees that competitive gradientbased learning algorithms (both when the agents have oracle access to their gradients and in the stochastic gradient setting) avoid linearly unstable critical points (i.e. strict saddle points) of the dynamics. This has positive implications for zerosum games where such saddle points are generically not local Nash equilibria. For gradientbased learning in potential games and generalsum games, however, this is a strongly negative result. Indeed, this result implies that gradientbased learning algorithms will almost surely avoid a subset of the Nash equilibria of the game. This is a particularly interesting observation for potential games, which are ostensibly the nicest games one could hope to face when employing gradientbased learning as they admit a transformation of coordinates under which the agents can be viewed as optimizing a single objective function. Unlike singleagent gradientdescent, however, saddle points can be critical points of interest—i.e. Nash equilibria.
Secondly, by viewing gradientbased learning in games through the lens of dynamical systems theory, we highlight many of the problems plaguing such algorithms in practice. Specifically, we show that dynamics formed from the gradients of individual agent cost derivatives are not gradient flows and thus, competitive gradientbased learning may converge to periodic orbits. Further, we highlight the existence of nonNash locally asymptotically stable equilibria of these dynamics. Such equilibria arise in both zerosum and generalsum games and can be seen as artifacts of the choice of algorithm, and their relevance to the underlying game is unknown. Hence, some care needs to be taken regarding the implementation and the interpretation of the limit points of gradientbased learning algorithms in competitive settings. We provide examples in Appendix A demonstrating various limiting behaviors of gradientbased learning algorithms in competitive settings.
Concretely, the contributions of this paper are summarized as follows:

[itemsep=3pt, topsep=0pt, leftmargin=12pt]

A characterization of the limiting behaviors of competitive gradientbased learning when all players either have access to their gradient, or to an unbiased estimate of their gradient. This is done by leveraging dynamical systems theory and the theory of stochastic approximations.

A new class of games, namely MorseSmale games, for which the dynamics correspond to a MorseSmale vector field. There are a couple points to make here on the significance of this class of games and the results we have for them. First, it is wellknown that MorseSmale vector fields are
generic [26, 38]—that is, almost all smooth vector fields are MorseSmale. MorseSmale vector fields (on compact manifolds) also have a finite number of critical points; hence, our results imply that almost all games on compact smooth manifolds admit gradientlike flows with a finite number of critical points (candidate Nash equilibria). Further linearly unstable cycles (which includes some Nash equilibria) are almost surely avoided under the gradientlike flow. 
A general framework for modeling competitive gradientbased learning that applies to a broad swath of learning algorithms. Gradientbased learning is a favored approach to multiagent reinforcement learning, multiarmed bandits, adversarial learning, and multiagent online learning and optimization. Such algorithms are increasingly being employed in these domains without a solid understanding of how to formally interpret the results. We provide a general framework for analyzing competitive gradientbased learning and a taxonomy of limiting behavior, that allows us to apply our theoretical results to commonly used learning algorithms.

Illustrative examples highlighting the frequency of nonNash critical points, and Nash equilibria that are saddle points of the gradient dynamics.
1.2 Organization
The remainder of this paper is organized as follows. In Section 2, we introduce our framework for analyzing competitive gradientbased learning algorithms, as well as some mathematical and game theoretic preliminaries. In Section 3, we provide a brief taxonomy by drawing connections between the limiting behavior of gradientbased learning algorithms and gametheoretic and dynamical systems notions of equilibria. In Section 4, we present our main theoretical results for competitive gradientbased learning in both the deterministic (where agents have oracle access to their gradients at each iteration) and stochastic (where agents have an unbiased estimator of their gradient at each iteration) settings. In the second case, we include a highlevel overview of the categories of commonly used learning algorithms that fit into the framework we consider. We present empirical evidence that shows that gradientplay will avoid Nash equilibria in a potentially large subset of linear quadratic (LQ) games in Section 5. We conclude with discussion of the results and provide comments on future directions in Section 6. In Appendix A, we specialize our results to a number of very popular multiagent learning algorithms and we provide several illustrative examples that highlight the different kinds of limiting behavior that gradientbased learning admits.
2 Preliminaries
Consider agents, indexed by . Each agent has their own decision variable , where is their finitedimensional strategy space of dimension . Define to be the finitedimensional joint strategy space with dimension . Each agent is endowed with a cost function , such that where we use the notation to make the dependence on the action of the agent , and the actions of all agents excluding agent , explicit. Each agent seeks to minimize their own cost, but only has control over their own decision variable . In this competitive setting, agents’ costs are not necessarily aligned with one another.
Given the game , our focus is on settings in which agents employ gradientbased learning algorithms in the search for an equilibrium. In particular, agents are assumed to update their strategies simultaneously according to a gradient based learning algorithm of the form
(1) 
where agents either have oracle access to the gradient of their cost with respect to their own choice variable—i.e. where denotes the derivative of with respect to —or they have an unbiased estimator for their gradient—i.e. where
is a zero mean, finite variance stochastic process. We refer to the former setting as
deterministic gradientbased learning and we refer to the latter setting as stochastic gradientbased learning.Assuming that agents are employing an algorithm such as (1), our goal is to analyze the stationary behavior of these coupled algorithms leveraging the following gametheoretic notion of a Nash equilibrium. A strategy is a local Nash equilibrium for the game if for each there exists an open set such that that and for all . If the above inequalities are strict, then we say is a strict local Nash equilibrium. If for each , then is a global Nash equilibrium.
Another important and useful characterization of Nash leverages first and second order conditions on player cost functions. [ [42, 43]] A point is said to be a differential Nash equilibrium for the game if and for each . Define
(2) 
to be the vector of player derivatives of their own cost functions with respect to their own choice variables. A point is said to be a critical point for the game if . Note that and are necessary conditions for a point to be a local Nash equilibrium [43]. Hence, all local Nash equilibria must be critical points.
Differential Nash need not be isolated, as the simple illustrative example in [42] shows. However, for a differential Nash , if
is nondegenerate—i.e. —then is an isolated strict local Nash equilibrium.
3 Links Between Dynamical System and GameTheoretic Notions of Equilibria
Before continuing, we remark that nondegenerate differential Nash equilibria are structurally stable [43]. Structural stability ensures that equilibria are stable and persist under small perturbations. We define stability of nondegenerate differential Nash equilibria as follows. [[43]] A differential Nash equilibrium is stable if the spectrum of is in the open righthalf plane—i.e. . If agents initialize in a neighborhood of a stable differential Nash equilibrium and follow the flow defined by , then they will asymptotically converge to . Specifically, if the spectrum of is strictly in the righthalf plane, then the differential Nash equilibrium is locally (exponentially) attracting under the flow of [43, Proposition 2]. This, in turn, implies that a discretized version of , namely
(3) 
converges locally for appropriately selected step size . Such results motivate the study of the continuous time dynamical system in order to understand convergence properties of gradientbased learning algorithms of the form (1).
Along this line of thinking, let us draw a few more links between equilibria of and characterizations of local Nash equilibria. To do so, we characterize the critical points of by their properties under the flow of .
A point is a locally asymptotically stable equilibrium of the continuous time dynamics if and for all . Let for
denote the eigenvalues of
at where —that is, is the eigenvalue with the smallest real part. A point is a saddle point of the dynamics if and is such that . A saddle point such that for and for with and is a strict saddle point or linearly unstable critical point—we use these terms interchangeably—of the continuous time dynamics .We now present a few preliminary propositions that highlight the links between critical points under the flow , and those critical points that have a particular game theoretic relevance.
A nondegenerate differential Nash equilibrium is either a locally asymptotically stable equilibrium or a strict saddle point of . Suppose that is a nondegenerate differential Nash equilibrium. We claim that . Since is a differential Nash equilibrium, for each ; these are the diagonal blocks of . Further implies that . Since , . Thus, it is not possible for all the eigenvalues to have negative real part. Since is nondegenerate, so that none of the eigenvalues can have zero real part. Hence, at least one eigenvalue has strictly positive real part.
To finish, we show that the conditions for nondegenerate differential Nash equilibrium are not sufficient to guarantee that is locally asymptotically stable for the gradient dynamics—that is, not all eigenvalues of have strictly positive real part. We do this by constructing a class of games with the strict saddle point property. Consider a class of two player games on such that has the form
(4) 
with . If is a nondegenerate differential Nash equilibria, and which implies that . Choosing such that will guarantee that one of the eigenvalues of is negative and the other is positive, making a strict saddle points. This shows that nondegenerate differential Nash equilibria can be strict saddle points of the combined gradient dynamics.
Hence, for any game , a nondegenerate differential Nash equilibrium is either a locally asymptotically stable equilibrium or a strict saddle point, but it not strictly unstable or strictly marginally stable (i.e. having eigenvalues all on the imaginary axis).
Another important point to make is that not every locally asymptotically stable equilibrium of is a nondegenerate differential Nash equilibrium. Indeed, the following example provides an entire class of games whose corresponding dynamics admit locally asymptotically stable equilibrium that are not even local Nash equilibria. Consider the same class of games presented in proof of Proposition 3. That is, two player games on with as in (4). Suppose that is a critical point such that , or ,—i.e. is not Nash since it violates the necessary conditions for a local Nash equilibrium. Then as long as , . Thus, the set of locally asymptotically stable equilibria that are not Nash equilibria may be arbitrarily large.
We now momentarily constrain ourselves to 2player zerosum games. Such games arise when training GANs, in adversarial learning, and in multiagent reinforcement learning [22][10][37].
For an arbitrary twoplayer zerosum game, on , if is a differential Nash equilibrium, then is both a nondegenerate differential Nash equilibrium and a locally asymptotically stable equilibrium of .
Consider a two player game on with . For such a game,
Note that . Suppose that is a differential Nash equilibrium and let with and . Then,
where the second line follows since and for , a differential Nash equilibrium. Since is arbitrary, this implies that is positive definite and hence, clearly nondegenerate. Thus, for twoplayer zerosum games, all differential Nash equilibria are both nondegenerate differential Nash equilibria and locally asymptotically stable equilibria of
The preceding proposition shows that all nondegenerate differential Nash equilibria in twoplayer zerosum games are locally asymptotically stable equilibria under the flow of . This has been shown before as a consequence of the results in [42]. However, we again remark that the converse is not true. Not every locally asymptotically stable equilibrium in twoplayer zerosum games are nondegenerate differential Nash equilibria. Indeed, there may be many locally asymptotically stable equilibrium in a zerosum game that are not local Nash equilibria. The following example highlights this fact. Consider a two player game with and of the form
where . If and , then has eigenvalues strictly negative real part. Thus there exists a continuum of zerosum games with a large set of locally asymptotically stable equilibria of the corresponding dynamics that are not differential Nash.
We now briefly focus on a particularly nice set of games known as potential games [36]. These are games for which corresponds to a gradient flow under a coordinate transformation—that is, there exists a function such that for each , for all . Note that a necessary and sufficient condition for to be a potential game is that is symmetric [36]—that is, . This gives potential games the desirable property that the only locally asymptotically stable equilibria of the gradient dynamics in potential games are local Nash equilibria.
For an arbitrary potential game, on , if is a locally asymptotically stable equilibrium of then is a nondegenerate differential Nash equilibrium.
The proof follows from the definition of a potential game. Since is a potential game, it admits a potential function such that for all . This, in turn, implies that at a locally asymptotically stable equilibrium of , , where is the Hessian matrix of the function . Further must have strictly positive eigenvalues for to be a locally asymptotically stable equilibrium of . Since the Hessian matrix of a function must be symmetric, , must be positive definite, which through Sylvester’s criterion ensures that each of the diagonal blocks of is positive definite. Thus, we have that the existence of a potential function guarantees that the only locally asymptotically stable equilibria of , are differential Nash equilibria.
The preceding proposition rules out nonNash locally asymptotically stable equilibria of the gradient dynamics in potential games. However, the following example shows that the existence of a potential function is not enough to rule out local Nash equilibria that are saddle points of the dynamics.
Consider a two player potential game with . At a differential Nash equilibrium, has the form:
where , and . If , has one positive eigenvalue and one negative eigenvalue. Thus there exists a continuum of potential games with a large set of differential Nash equilibria of the corresponding dynamics that are strict saddle points.
We finish our mathematical preliminaries with a note on the relationship between nondegenerate differential Nash equilibria and local Nash equilibrium. It turns out that nondegenerate differential Nash equilibria are generic among local Nash equilibria. Hence, by proving statements about convergence to nondegenerate differential Nash equilibria, we are able to make statements about convergence to local Nash equilibria for almost all games in a formal mathematical sense. The following theorem first appeared in [44] for the twoplayer case, and while the extension to the –player case is straightforward, we provide the proof in Section B for completeness. Non–degenerate differential Nash equilibria are generic among local Nash equilibria: for any smooth boundaryless manifolds there exists an open–dense subset with such that for all , if is a local Nash equilibrium for , then is a non–degenerate differential Nash equilibrium for . We provide the proof of this result in Appendix B.
Genericity implies that local Nash equilibria in an opendense set of continuous games (in the topology on agent costs) are nondegenerate differential Nash equilibria. Thus, for almost all games, the set of local Nash equilibria coincides exactly with the set of nondegenerate differential Nash equilibria. This also implies that saddle points for the dynamics induced by the flow of that are local Nash equilibria, are also generically strict saddle points.
For a game , denote the set of strict saddle points and the set of locally asymptotically stable equilibria of the corresponding dynamics as and respectively. Similarly, denote the set of local Nash equilibria, differential Nash equilibria, and nondegenerate differential Nash equilibria of as , , and , respectively. Combining the comments on genericity with the observations on stability gives us the following key takeaways:

[itemsep=3pt, topsep=0pt, leftmargin=12pt]

If is a generic player generalsum game, then

If is a generic 2player zerosum game, then

If is a generic player potential game, then
The inclusions are strict due to the existence of nonNash locally asymptotically stable equilibria in both the generalsum and zerosum cases.
4 Main Theoretical Results
In this section, we provide the main theoretical results and differentiate those results from existing work. We also include a highlevel overview of wellknown algorithms that fit into class of learning algorithms we consider—and hence, to which our theory applies.
4.1 Deterministic Competitive GradientBased Learning
Let us first consider the deterministic setting in which agents have oracle access to their gradients at each time step. This setting encapsulates the case where agent know their own cost functions and observe their own actions as well as their competitors’ actions—and hence, can compute the gradient of their cost with respect to their own choice variable—as well as the setting where agents do not necessarily know their cost or observe their competitors’ actions, but rather some external oracle provides to them at each iteration .
Each agent has their own learning rate (i.e. step sizes ) so that the joint dynamics of all the players are given by
(5) 
where and, by a slight abuse of notation, is defined to be elementwise multiplication of and where is multiplied by the first components of , is multiplied by the next components, and so on. We make the following assumptions on the cost functions and learning rates .
Assumption 1
For each , with , , and .
Note that the norm is the induced norm. We rewrite the game dynamics in the following form
(6) 
where and elementwise.
Gradientbased optimization schemes always correspond to gradient flows whereas games are not afforded this luxury—indeed, is not in general symmetric—and hence, this makes them a very interesting class of problems to study. In particular, the vector field defined by is not necessarily a vector field defined by the gradient of a function, and thus, the dynamics admit limit cycles amongst their periodic orbits. This distinguishes gradientbased optimization from gradientbased learning in a multiagent or gametheoretic setting. The following example is of a class of games admitting a variety of periodic orbits.
Consider the game on with defined by
(7) 
where agents are minimizers and is a parameter. Then,
Transforming the dynamics to radial coordinates, and , it is easy to see that there is a periodic orbit on a circle with unit radius for any . Moreover, the periodic orbit is a stable limit cycle for and unstable limit cycle if . When , on the other hand, there are an infinite number of periodic orbits and no limit cycles. Moreover, when , is a local, stable Nash equilibrium (and a locally stable critical point for the dynamics).
Having shown that gradientbased learning can exhibit limit cycles, the question remains: what are the limiting behaviors of gradientbased learning in competitive environments? The following result states that the set of initial conditions leading to linearly unstable equilibria is of measure zero. Let and satisfy Assumption 1. Suppose that is open and convex. If , the set of initial conditions so that competitive gradientbased learning converges to linearly unstable critical points (strict saddle points) is of measure zero.
First, we note that the above theorem holds for , in particular, since holds trivially in this case. It is also important to note that differential Nash equilibria can be linearly unstable critical points—that is, they can be strict saddle points of the dynamics—and due to Theorem 3, generically, so can local Nash equilibria. The above theorem says that all local Nash equilibria that are linearly unstable critical points for the discretization of are avoided almost surely.
The proof of Theorem 1 relies on the stable manifold theorem [46, Theorem III.7], [47]. We provide its statement in Theorem C in Appendix C. Some parts of the proof follow similar arguments to the proofs of results in [30, 39] which apply to (singleagent) gradientbased optimization. Due to the different learning rates employed by the agents and the introduction of the differential game form , the proof differs.
[of Theorem 1] We claim the mapping is a diffeomorphism. If we can show that is invertible and a local diffeomorphism, then the claim follows. Let us first prove that is invertible.
Consider and suppose so that . The assumption implies that satisfies the Lipschitz condition on . Hence, . Let where —that is, is an diagonal matrix with repeated on the diagonal times. Then, since .
Now, observe that . If is invertible, then the implicit function theorem [31, Theorem C.40] implies that is a local diffeomorphism. Hence, it suffices to show that does not have an eigenvalue of . Indeed, letting be the spectral radius of a matrix , we know in general that for any square matrix and induced operator norm so that Of course, the spectral radius is the maximum absolute value of the eigenvalues, so that the above implies that all eigenvalues of have absolute value less than .
Since is injective by the preceding argument, its inverse is welldefined and since is a local diffeomorphism in , it follows that is smooth in . Thus, is a diffeomorphism.
Consider all critical points to the game—i.e. . For each , let be the open ball derived from Theorem C and let . Since , Lindelõf’s lemma [29]—every open cover has a countable subcover—gives a countable subcover of . That is, for a countable set of critical points with , we have that .
Starting from some point , if gradientbased learning converges to a strict saddle point, then there exists a and index such that for all . Again, applying Theorem C and using that —which we note is obviously true if —we get that .
Using the fact that is invertible, we can iteratively construct the sequence of sets defined by and . Then we have that for all . The set contains all the intial points in such that gradientbased learnign converges to a strict saddle.
Since is a strict saddle, has an eigevalue greater than . This implies that the codimension of is strictly less than (i.e. ). Hence, has Lebesgue measure zero in .
Using again that is a diffeomorphism, so that it is locally Lipschitz and locally Lipschitz maps are null set preserving. Hence, has measure zero for all by induction so that is a measure zero set since it is a countable union of measure zero sets.
For the class of potential games, agents employing a gradientbased learning scheme converge to differential Nash equilibria almost surely. Consider a potential game on open, convex and where each for . Let be a prior measure with support which is absolutely continuous with respect to the Lebesgue measure and assume exists. Then, under the assumption of Theorem 1, competitive gradientbased learning converges to nondegenerate differential Nash equilibria almost surely. Moreover, the nondegenerate differential Nash to which it converges is generically a local Nash equilibrium. Since the game admits a potential function ,
so that analysis of the gradientbased learning scheme reduces to analyzing gradientbased optimization of . Moreover, existence of a potential function also implies that so that is symmetric. Indeed, writing as the differential form and noting that for the differential operator , we have that
(8) 
Symmetry of implies that all periodic orbits are equilibria—i.e. the dynamics do not possess any limit cycles. By Theorem 1, the set of initial points that converge to linearly unstable critical points is of measure zero. Since all the stable critical points of the dynamics are equilibria, with the assumption that exists for all , we have that where is a nondegenerate differential Nash equilibrium which, by Theorem 3, is generically a local Nash equilibrium.
The interesting thing to point out here is that the agents do not need to be doing gradientbased learning on to converge to Nash almost surely. That is, they do not need to know the function ; they simply need to follow the derivative of their own cost with respect to their own choice variable. The potential function is also generically a Morse function^{1}^{1}1Morse theory states that there is an open dense (in the topology) set of functions called Morse functions where the Hessian is nondegenerate. and, as such, the number of critical points are finite. Thus, Corollary 1 implies that competitive gradientbased learning converges generically to one of finitely many local Nash equilibria in potential games.
Theorem 1 (and Corollary 1) is important since it suggests that gradientplay in multiagent settings, avoids strict saddles almost surely even in the deterministic setting. For zerosum games, this is a positive result since it suggests that eventually gradientbased learning algorithms will escape strict saddle points of the dynamics. For generalsum and potential games, as we noted in Section 2, strict saddle points can be local Nash equilibria. Therefore, this is a negative result since a potentially large subset of the local Nash equilibria in a game will be avoided almost surely under gradientbased learning.
Since games generally do not admit purely gradient flows, other types of limiting behavior such as limit cycles can occur in gradientbased learning dynamics. Theorem 1 says nothing about other limiting behavior. In the stochastic setting, we state stronger results on avoidance of linearly unstable periodic orbits which include limit cylces.
4.2 Stochastic Competitive GradientBased Learning
We now analyze the stochastic case in which agents are assumed to have an unbiased estimator for their gradient. The results in this section are significant as they allow us to extend the results from the deterministic setting to a setting where each agent builds an estimate of the gradient of their loss at the current set of strategies from potentially noisy observations of the environment. This setup allows us to analyze the limiting behavior of gradientbased learning schemes in multiagent reinforcement learning, multiarmed bandits, generative adversarial networks, and online optimization. In particular, we extend our results from the previous section to show that with unbiased estimates of the gradient of their loss function with respect to their own choice variable, agents will almost surely not converge to linearly unstable critical points and cycles. We also construct a new class of games, namely MorseSmale games, and show that for such games, using gradientbased learning algorithms, agents will converge to locally asymptotically stable equilibria, including, but not limited to, stable local Nash equilibria, or stable limit cycles. This second result implies that gradientbased learning algorithms can converge to nonNash locally asymptotically stable equilibrium, which is again a negative result for gradientbased learning algorithms in games.
Each agent updates their strategy using
(9) 
where is an unbiased estimator for and hence, we can write it as
for some zeromean, finitevariance stochastic process . Before developing our theoretical results for stochastic case, let us comment on the different learning algorithms that fit into this framework.
4.2.1 Example Classes of GradientBased Learning Algorithms
The stochastic gradientbased learning setting we study is general enough to include a variety of commonly used multiagent learning algorithms. The classes of algorithms we include is hardly an exhaustive list, and indeed many extensions and altogether different algorithms exist that can be considered members of this class.
In Table 1, we provide the gradientbased update rule for six different example classes of learning problems: (i) gradient play in noncooperative continuous games, (ii) GANs, (iii) multiagent policy gradient, (iv) individual Qlearning, (v) multiagent gradient bandits, and (vi) multiagent experts. We provide a detailed analysis of these different algorithms including the derivation of the gradientbased update rules along with some interesting numerical examples in Appendix A. In each of these cases, one can view an agent employing the given algorithm as building an unbiased estimate of their gradient from their observation of the environment.
For example, in policy gradient (see, e.g., [48, Chapter 13]), agents’ costs are defined as functions of a parameter vector that parameterize their policies . The parameters are agent ’s choice variable; through the learning scheme they aim to tune the parameter via following the gradient of their loss function in order to converge on an optimal policy . Perhaps surprisingly, it is not necessary for agent to have access to or even in order for them to construct an unbiased estimate of the gradient of their loss with respect to their own choice variable as long as they observe the sequence of actions, say , of all other agents generated. These actions are implicitly determined by the other agents’ policies . Hence, in this case if agent observes , where are the reward, action, and state of agent , then this is enough to construct an unbiased estimate of their gradient.
Problem Class  Gradient Learning Rule 

Gradient Play  
GANs  
MultiAgent Policy Gradient  
Individual QLearning  
MultiAgent Gradient Bandits  , 
MultiAgent Experts  , 
4.2.2 Stochastic Gradient Results
Coming back to the analysis of
(10) 
we make the following assumptions.
Assumption 2
The stochastic process satisfies the assumptions , and a.s., for , where is an increasing family of
fields—filtration, or history generated by the sequence of random variables—given by
.Assumption 3
For each , with , is Lipschitz for some constant , the stepsizes satisfy for all and and , and a.s.
Let and denotes the inner product. Consider a game on . Suppose each agent adopts a gradientfree learning algorithm that satisfies Assumptions 2 and 3. Further, suppose that for each , there exists a constant such that for every unit vector . Then competitive gradientfree learning converges to linearly unstable critical points of the game on a set of measure zero. The above theorem implies that the stochastic approximation dynamics in (10), describing the competitive gradientfree learning corresponding to a game, avoid critical points of the game corresponding to strict saddles. The proof follows directly from showing under the assumptions that (10) satisfies Theorem C.
We again point out that the set of linearly unstable critical points of the game includes a nonnegligible subset of the local Nash equilibria. Thus, the previous theorem says that stochastic gradientbased learning will avoid a nonnegligible subset of the local Nash equilibria of the game almost surely. We note that the assumption , can be interpreted as a requirement on the noise having a nonzero component in each direction without which it is possible to converge to unstable points [40].
Under the assumptions of Theorem 3, if (10) converges to a critical point, then that critical point is a locally asymptotically stable equilibrium of . In particular, not all points that are attracting under the flow of are local Nash equilibria.
As we did with potential games in the preceding section, we can state stronger results for certain nice classes of games. As we have noted, games not admitting potential functions may lead to limit cycles. Hence, we use the expanded theory in [3, 6] to show that stochastic gradientbased learning algorithms avoid repelling sets.
To do so, we need further assumptions on our underlying space—i.e. we need the underlying decision spaces of agents to be smooth, compact manifolds without boundary where before we simply required them to either be Euclidean space—i.e. —or some open, convex subset of satisfying . In particular, let be a smooth, compact manifold without boundary for each . The stochastic process which follows (10) is defined on —that is, for all . As before, it is natural to compare sample points to solutions of where we think of (10) as a noisy approximation. As in the previous section, the asymptotic behavior of can indeed be described by the asymptotic behavior of the flow generated by .
A nonstationary periodic orbit of is called a cycle. Let be a cycle of period . Denote by the flow corresponding to . For any , where is the set of characteristic multipliers. We say is hyperbolic if no element of is on the complex unit circle. Further, if is strictly inside the unit circle, is called linearly stable and, on the other hand, if has at least one element on the outside of the unit circle—that is, for has an eigenvalue with real part strictly greater than —then is called linearly unstable. The latter is the analog of linearly unstable critical points in the context of periodic orbits.
We denote by sample paths of the process (10) and is the limit set of any sequence which is defined in the usual way as all such that for some sequence . It was shown in [3] that under less restrictive assumptions than Assumptions 2 and 3, is contained in the chain recurrent set of and is a nonempty, compact and connected set invariant under the flow of . Consider a game where each is a smooth, compact manifold without boundary. Suppose each agent adopts a stochastic gradientbased learning algorithm that satisfies Assumptions 2 and 3 and is such that sample points for all . Further, suppose that for each , there exist a constant such that for every unit vector . Then competitive gradientfree learning converges to linearly unstable cycles on a set of measure zero—i.e. where is a sample path. As we noted, periodic orbits are not necessarily excluded from the limiting behavior of gradientbased learning in games. We leave out the proof of Theorem 3 since, other than some algebraic manipulation, it is a direct application of [6, Theorem 2.1] (which we provide in Theorem C in Appendix C).
The above theorem simply states competitive stochastic gradientbased learning avoids linearly unstable cycles on a set of measure zero. Of course, we can state stronger results for a more restrictive class of games admitting gradientlike vector fields. Specifically, analogous to [6], we can consider MorseSmale vector fields. We introduce a new class of games, which we call MorseSmale games, that are a generalization of potential games. These are a very important class of games as they correspond to MorseSmale vector fields which are known to be generic and in the case that the joint strategy space is a compact manifold, this implies MorseSmale games are open, dense in the set of games.
A game with for some and where strategy spaces is a smooth, compact manifold without boundary for each is a MorseSmale game if the vector field corresponding to the differential is MorseSmale—that is, the following hold: (i) all periodic orbits (i.e. equilibria and cycles) are hyperbolic and (i.e. the stable and unstable manifolds of intersect transversally), (ii) every forward and backward omega limit set is a periodic orbit, (iii) and has a global attractor. The conditions of MorseSmale in the above definition ensure that there are only finitely many periodic orbits. The simplest example of a MorseSmale vector field is a gradient flow. However, not all MorseSmale vector fields are gradient flows and hence, not all MorseSmale games are potential games. Consider the player game with for each and This is a MorseSmale game that is not a potential game. Indeed, where is a dynamical system with a MorseSmale vector field that is not a gradient vector field [11].
Essentially, in a neighborhood of a critical point for a MorseSmale game, the game behavior can be described by a Morse function such that near critical points can be written as and away from critical points points in the same direction as —i.e. . Specializing the class of MorseSmale games, we have stronger convergence guarantees. Consider a MorseSmale game on smooth boundaryless compact manifold . Suppose Assumptions 2 and 3 hold and that is defined on . Let denote the set of periodic orbits in . Then and implies is linearly stable. Moreover, if the periodic orbit with is an equilibrium, then it is either a nondegenerate differential Nash equilibrium—which is generically a local Nash—or a nonNash locally asymptotically stable equilibrium. The proof of Theorem 3 utilizes [6, Corollary 2.2] (which we provide in Corollary C in Appendix C).
If we further restrict the class of games to potential games, the above theorem implies convergence to Nash in almost surely. Consider the game on smooth boundaryless compact manifold admitting potential function . Under the assumptions of Theorem 3, competitive stochastic gradientbased learning converges to a nondegenerate differential Nash equilibrium almost surely. Moreover, the differential Nash to which it converges is generically a local Nash equilibrium.
Comments
There are no comments yet.