# Learning with minimal information in continuous games

We introduce a stochastic learning process called the dampened gradient approximation process. While learning models have almost exclusively focused on finite games, in this paper we design a learning process for games with continuous action sets. It is payoff-based and thus requires from players no sophistication and no knowledge of the game. We show that despite such limited information, players will converge to Nash in large classes of games. In particular, convergence to a Nash equilibrium which is stable is guaranteed in all games with strategic complements as well as in concave games; convergence to Nash often occurs in all locally ordinal potential games; convergence to a stable Nash occurs with positive probability in all games with isolated equilibria.

## Authors

• 1 publication
• 3 publications
• 1 publication
• ### Stability of Gradient Learning Dynamics in Continuous Games: Scalar Action Spaces

Learning processes in games explain how players grapple with one another...
11/07/2020 ∙ by Benjamin J. Chasnov, et al. ∙ 0

• ### Strategic Teaching and Learning in Games

It is known that there are uncoupled learning heuristics leading to Nash...
04/23/2015 ∙ by Burkhard C. Schipper, et al. ∙ 0

• ### Games on Endogenous Networks

We study network games in which players both create spillovers for one a...
02/02/2021 ∙ by Benjamin Golub, et al. ∙ 0

• ### Game-theoretical control with continuous action sets

Motivated by the recent applications of game-theoretical learning techni...
12/01/2014 ∙ by Steven Perkins, et al. ∙ 0

• ### Learning Quadratic Games on Networks

Individuals, or organizations, cooperate with or compete against one ano...
11/21/2018 ∙ by Yan Leng, et al. ∙ 4

• ### Imitation dynamics in population games on community networks

We study the asymptotic behavior of deterministic, continuous-time imita...
09/21/2020 ∙ by Giacomo Como, et al. ∙ 0

• ### M Equilibrium: A dual theory of beliefs and choices in games

We introduce a set-valued generalization of Nash equilibrium, called M e...
11/13/2018 ∙ by Jacob K. Goeree, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper we construct a simple stochastic learning rule with the three following properties: (i) it is designed for games with continuous action sets; (ii) it requires no sophistication from the players and (iii) it converges to Nash equilibria in large classes of games. The question of convergence to Nash equilibria by agents playing a game repeatedly has given rise to a large body of literature on learning. One branch of this literature explores whether there are learning rules - deterministic or stochastic - which would converge to Nash equilibria in any game (see i.e. Hart and Mas-Colell (2003), Hart and Mas-Colell (2006), Babichenko (2012), Foster and Young (2006), Germano and Lugosigermano2007global (2007)). Another branch, to which this paper contributes, focuses on specific learning rules and on the understanding of their asymptotic behavior.

Both branches have almost exclusively addressed the issue of learning in discrete games (i.e. games where the set of strategies is finite). However, many economic variables such as price, effort, time allocation, are non-negative real numbers, and thus are continuous. Typical learning models that have been designed for discrete games cannot be adapted to continuous settings without major complications, since they usually rely on assigning a positive probability to choosing each action, which cannot be done in continuous games. In this paper we introduce a learning rule designed for continuous games, which we call the dampened gradient approximation process (DGAP), and we analyze its behavior in several well-known classes of games.

Learning rules can be more or less demanding in terms of players’ sophistication and of the amount of information required to implement them. The DGAP belongs to the category of so-called payoff-based or completely uncoupled learning rules, meaning that players know nothing about the payoff functions (neither theirs nor those of their opponents), and they know nothing about the other players’ actions, nor about their payoffs. They may not even be aware that they are playing a game. They only observe their own realized payoffs after each iteration of the game and make decisions based on these observations.

Agents aim at maximizing their payoffs by choosing an action. If players knew the gradient of their utility function at every point, a natural learning process in continuous games would be for agents to follow a gradient method (see for instance Arrow and Hurwicz (1960)). However, because players neither know the payoff functions nor observe the others’ actions, they would be unable to compute these gradients.

In DGAP, agents construct an approximation of the gradient at the current action profile, by randomly exploring the effects of increasing or decreasing their actions by small increments. The agents use the information collected from this exploration to choose a new action: if the effect revealed is an increase (resp. decrease) in payoff, then players move in the same (resp. opposite) direction, with an amplitude proportional to the approximated gradient.

Although this procedure resembles a gradient learning process, there are two major differences from a standard gradient method. First, the DGAP is a random process instead of a deterministic dynamical system. Second, in the standard gradient method with non-negative actions, players’ behaviors are discontinuous at the boundary of the strategy space (see Arrow and Hurwicz (1960)). In order to avoid such discontinuity in players’ behavior, we assume that changes in actions are dampened as they approach the boundary. Hence the name of our learning process.

We first prove that this process is well-defined - i.e. players’ actions always remain non-negative (Proposition 2.1). Then we analyze its convergence properties and find that contrary to discrete games, where convergence to Nash of specific learning processes is generally difficult to obtain even for two- or three-player games, convergence is obtained in large classes of games with arbitrary numbers of players. We restrict to strongly single-peaked payoff functions111See Hypothesis 1 for a proper definition., focusing our attention on three classes of games that are of particular interest for economics and have been extensively analyzed in the learning literature: games with strategic complements, a class of games containing all potential games, and all games where the set of Nash equilibria is finite. This last class includes all games with a unique Nash equilibrium, such as strictly concave games and many of the generalized continuous zero-sum games.

The DGAP is a stochastic process, the random part being the direction chosen for the exploration. We analyze its (random) set of accumulation points, called the limit set, by resorting to stochastic approximation theory. This theory tells us that the long-run behavior of the stochastic process is driven by some underlying deterministic dynamical system. We thus start by showing that the deterministic system that underlies our specific stochastic learning process is a dampened gradient system (Proposition 2.1). We also show that all the Nash equilibria of a game are stationary points - otherwise called zeros - of this dynamical system, although other points may also be stationary. However, we prove (Proposition 2.2) that non-Nash stationary points are necessarily unstable222Throughout the paper, several notions of stability will be used. They are all recalled in Section 2.. This is done in Section 2, where we also detail the DGAP and provide the necessary definitions.

Stochastic approximation theory tells us that the stationary points of the underlying dynamical system are plausible candidates for the limit set of the random process. Yet it does not provide general criteria for excluding some of these candidates so as to obtain more precise predictions. This is actually one of the major difficulties in the field (see for instance BenFau12 (2012)). While the conceptual contribution of this paper lies in providing a natural learning process for games with continuous action sets, our technical contribution lies in providing precise statements on the structure of the limit set of the DGAP. Each result that we get is different, in the sense that it uses a different mathematical tool. It is remarkable that almost all our results hold with probability one, which is in general very difficult to obtain.

In Section 3, we analyze games with strategic complements. We show (Theorem 3.1) that the DGAP cannot converge333Because the process is stochastic, the notion of convergence we use here is that of almost sure convergence. to an unstable Nash equilibrium. Furthermore, we prove that the process will almost surely converge to a Nash equilibrium which is stable, except in very specific cases involving the structure of interactions between players: non-convergence might occur under a condition called bipartiteness.

In Section 4, we analyze a class of games that we call locally ordinal potential games. This class contains all the ordinal potential games, which in turn contain all the potential games. We have three results (in Theorems 4.1 and 4.2). First, the limit set of the DGAP is always contained in the set of stationary points of the dynamics. When equilibria are isolated, this implies that the process converges to a Nash equilibrium with probability one, since we prove that the process cannot converge to a non-Nash stationary point in these games. Second, we show that under the condition of non-bipartiteness, the DGAP converges to a Nash equilibrium which is stable when equilibria are isolated. Third, although convergence to unstable stationary points (possibly non-Nash) cannot be ruled out in general (i.e. when equilibria are not isolated), we characterize the set of stable stationary points. We prove that they are local maxima of the locally ordinal potential function, that they are necessarily connected components of Nash equilibria, and that they are necessarily stable equilibria of another, unrelated dynamical system: the Best-Response dynamics.

Finally, in Section 5, we consider all games for which stationary points are isolated. This class includes the vast majority of games studied in economics. We cannot prove precise and general convergence results, since there is no guarantee that the limit set of the process will be included in the set of stationary points. Still, we state two results. First, DGAP will converge to a stable Nash equilibrium with positive probability in all these games. Second, we exclude convergence to what we call undesirable stationary points, i.e. those that are non-Nash, and unstable Nash equilibria.

Also in Section 5, we focus on games with a unique Nash equilibrium that have previously been analyzed, either by Arrow and Hurwicz (1960) or by Rosen (1965). They examined dynamical systems which are either discontinuous or complex gradient systems, obtaining convergence to the unique Nash equilibrium. We obtain the same results - for our process - with convergence to the unique equilibrium with probability one. Thus, another contribution of this paper is to show that we can use a gradient system which is both simple and continuous, and preserve the convergence properties.

Related Literature

As mentioned earlier, the learning literature has essentially focused on finite action games. Many rules have been proposed and studied, both in the non payoff-based and in the payoff-based contexts. In the former, the most widely-explored adaptive process is fictitious play (introduced in Brown (1951)), where players’ average actions are shown to converge444In all of the following papers, the convergence notions differ. This has implications in terms of the scope of each result. For details the reader should refer directly to the papers. for -player zero-sum games (Robinson (1951)), for games (Miyazawa (1961) and Berger (2005)), for potential games (Monderer and Shapley (1996)). Convergence is also obtained for stochastic fictitious play, introduced by Fudenberg and Kreps (1993), in games (Benaïm and Hirsch (1999)), zero-sum and potential games (Hofbauer and Sandholm (2002)) and supermodular games (Benaïm and FaureBenFau12 (2012)). However, it has been shown that fictitious play does not always converge to Nash once there are at least actions per player (Shapley (1964)). Other non payoff-based learning rules include hypothesis testing (Foster and Young (2003)) or calibrated forecasting (Kakade and Foster (2008)). Our contribution differs in both dimensions: we focus on continuous games and on payoff-based procedures.

Many payoff-based learning rules have been explored in the context of discrete games. Such rules include the popular class of reinforcement learning procedures (see

Börgers and Sarin (1997) or Erev and Roth (1998) for pioneer work). These procedures have been studied in very specific finite games: games in Posch (1997), -player games with positive payoffs in Börgers and Sarin (1997), Beggs (2005), Hopkins and Posch (2005) or Laslier et al. (2001). On the same topic, see also Leslie and Collins (2005), Cominetti et al. (2010), Bravo and Faure (2015) and Bravo (2016). However, it is not known if these procedures converge to Nash in more general games.

Other payoff-based procedures for discrete games have been proposed, including: Regret-testing (Foster and Young (2006)) which converges in any -person game; Generalized regret-testing (germano2007global (2007)) which converges in any generic -person game; Experimentation dynamics (Marden et al. (2009)) which converge to Nash in the class of -person weakly acyclic games; Trial and error (Young (2009)) which comes close to Nash equilibrium a large fraction of the time; Aspirational learning (Karandikar et al. (1998)) which may fail to converge even in games.

The literature on continuous game is sparser, and a distinction can also be made between procedures which are demanding in terms of sophistication and knowledge of the players, and procedures which are of the payoff-based type. In the first category, Arrow and Hurwicz (1960) prove that when all players’ payoff functions are strictly concave, the gradient method converges to the unique Nash equilibrium in generalized zero-sum games. Rosen (1965) studies a gradient method in concave -person games with a unique equilibrium, and shows that this unique equilibrium is globally asymptotically stable for some weighted gradient system, with suitably chosen weights. In a recent paper, Mertikopoulos (2018) studies a gradient-like stochastic learning algorithm where agents receive erroneous information about their gradients in the context of concave games, and shows that whenever this process converges, it does so to a Nash equilibrium. Using a different approach, Perkins and Leslie (2014) adapt stochastic fictitious play to games with continuous action sets, and show that it converges in -player zero-sum games. Our contribution differs from these in analyzing a payoff-based learning process.

The two papers most closely related to ours are those analyzing payoff-based procedures designed for continuous games. Dindos and Mezzetti (2006) consider a stochastic adjustment process called the better-reply. At each step, agents are sequentially picked to play a strategy chosen at random, while the other players do not move. The agent then observes the hypothetical payoff that this action would yield, and decides whether to stick to this new strategy or to go back to the previous one. This process converges to Nash when actions are either substitutes or complements around the equilibrium in games called aggregative games, with quasi-concave utility functions. However, their contribution differs from ours in several respects, the most important being that agents revise their strategies sequentially. The driving force for convergence is that with positive probability, every player will be randomly drawn as many times as necessary to approximate a best response. In our paper, we assume that all players move simultaneously. In that case, it is easy to construct a simple game where simultaneity drives the better-reply adjustment process to cycle.

The second related paper, by Huck et al. (2004), considers another type of payoff-based learning process, called Trial and error - but which has no link with the Young (2009) procedure - in the context of the Cournot oligopoly, where players move simultaneously, as in our paper. Players choose a direction of change and stick to this direction as long as their payoff increases, changing as soon as it decreases. The authors show that the process converges, but it does so to the joint-profit maximizing profile and not to the (unique) Nash equilibrium of the game. While our paper is similar to theirs in spirit, unlike them, we do not focus on a specific game555Although, as the authors suggest, the intuition for their result might carry through to other games., we allow for multiple equilibria, including continua of equilibria, and we get convergence to Nash. Our process differs in two respects. First, we consider an exploration stage where the players decide in which direction they will be moving. This exploration stage seems to make our agents less naive. Second, the amplitude of the moves is constant in their paper while in ours, amplitude depends on the variation in payoffs after an exploration stage. This might explain why we obtain convergence to Nash while they do not.

A group of papers coming from a different literature deserves mention here: in the literature on evolutionary dynamics in population games, many dynamical systems have been suggested in the context of infinite populations choosing strategies among a finite set. Recently, several papers have extended these dynamical systems to continuous strategy spaces. The continuous strategy version of the replicator dynamic has been studied by Bomze (1990), Oechssler and Riedel (2001), Oechssler and Riedel (2002), and Cressman (2005), while the Brown-von Neumann-Nash (BNN) dynamic has been extended by Hofbauer et al. (2009). Lahkar and Riedel (2015)

extend the logit dynamics,

Cheung (2014) adapts the pairwise comparison dynamics and Cheung (2016) works with the class of imitative dynamics (which includes the replicator dynamics). Although this group of papers deals with dynamical systems for continuous games, their contexts are totally different (continuum of players). Moreover, their main goal is to define the extensions of existing dynamics, and to see whether they are well-defined and share the same properties as their discrete-strategy counterparts.

## 2 The model

### 2.1 Definitions and hypothesis

Let be a set of players, each of whom repeatedly chooses an action from . An action can be thought of as an effort level chosen by individuals, a price set by a firm, a monetary contribution to a public good, etc. Let . We denote by the boundary of , i.e. . And we let denote the interior of .

At each period of time, players observe a payoff that is generated by an underlying repeated game , where

is the vector of payoff functions. Players know nothing about the payoff functions, nor about the set of opponents. In this paper we will examine several classes of underlying games, each class being defined by different properties on the functions

. However, we will always make the two following standing assumptions:

###### Hypothesis 1

For any , the payoff map is assumed to be on and with the property that, for any , there exists such that the map is strictly positive for and strictly negative for .

Hypothesis 1 implies that best responses are unique and . This assumption is verified for instance if is strictly concave, and .

In the games we consider, interactions between players can be very general. They can be heterogeneous across players and they can be of any sign. However we assume that externalities are symmetric in sign:

###### Hypothesis 2

Games are assumed to have symmetric externalities, i.e. and ,

 sgn(∂ui∂xj(x))=sgn(∂uj∂xi(x))

where if .

Most of the continuous games in the economics literature fall into this class. Note that a game with symmetric externalities does not require them to be of equal intensity. Also, symmetric externalities allow for patterns where exerts a positive externality on individual and a negative externality on individual . Note finally that symmetric externalities do not imply that for .

Some of our results will depend on the pattern of interactions in the game . We capture this pattern by an interaction graph, defined as follows. Let be an action profile. The interaction graph at profile is given by the matrix where and, for , if and otherwise. Note that the interaction graph is local, in the sense that it depends on the vector of actions. Thus can either be constant on or change as changes. Note also that the interaction graph of a game satisfying Hypothesis 2 is symmetric.

We now provide two examples of games satisfying Hypothesis 1 and 2 and describe the interaction graphs.

###### Example 2.1 (Public good game)

Players contribute an effort to a public good. The payoff of player is , where is the marginal cost of effort for , is a measure of subtitutability between and ’s efforts, and is a differentiable, strictly increasing concave function. In some contexts, not all players will benefit from one player’s contribution (see Bramoullé and Kranton (2007)). This is why we leave the possibility that for some pairs .

This game satisfies Hypothesis 1 and 2. Further, the interaction graph is constant since it does not depend on the action profile : , while .

###### Example 2.2 (Aggregate demand externalities)

In macro-economics, aggregate- demand-externality models (see for instance Fudenberg and Tirole (1991)) are games satisfying Hypothesis 1 and 2. For instance, search models à la Diamond enter this class, where players exert a search effort , with payoff functions , and where is the cost of searching, is the probability that and end up partners and is the gain when a partner is found. The interaction graph of such a game depends on the profile , since , thus , while .

This game can be generalized to local search models with payoffs , where, as above, and where means that individual exerts no externality on individual . A popular game analyzed in the network literature in a different context is the game introduced in Ballester et al. (2006), which is actually a local aggregate-demand-externality model, where , and . In that case, the interaction graph only partly depends on , since implies for all .

We denote by the set of Nash equilibria. In many economics applications, Nash equilibria would consist of isolated points. Examples 1 and 2 provide examples of games where the set of Nash equilibria is generically finite. However, in what follows we will sometimes deal with a continuum of equilibria. This is the case for instance in Example 1, when (see Bervoets and Faure (2018)). Because we wish to be as general as possible, we will consider connected components of :

###### Definition 2.1

Let be a compact connected subset of and let . We say that is a connected component of if there exists such that .

### 2.2 The Learning Process

We consider a payoff-based learning process in which agents construct a partial approximation of the gradient by exploring the effects of deviating in one direction that they chose at random at every period. This information allows agents to choose a new action depending on what they just learned from the exploration stage. Here we detail what agent does, bearing in mind that every agent simultaneously uses the same rule.

At the beginning of round , agent is playing action and is enjoying the associated payoff . Player then selects his actions and as follows.

Exploration stage - Player plays a new action , chosen at random around his current action . Formally, let

be a sequence of i.i.d random variables such that

. At period , is drawn and player plays .

Updating stage - Player observes his new payoff, and computes

 Δuin+1:=ui(ei2n+1,e−i2n+1)−ui(ei2n,e−i2n).

This quantity provides with an approximation of his payoff function’s gradient. Using this information, player updates his action by playing . Thus, when is positive, player follows the direction that he just explored, while he goes in the opposite direction when is negative.

Period ends. We set and agent gets the payoff . Round starts.

Let and be the history generated by . Studying the asymptotic behavior of the random sequence amounts to studying the sequence . Hence the focus of this paper is on the convergence of the random process .

The next proposition shows that the process is well-defined, in the sense that it always remains within the admissible region (i.e. the actions stay positive). It also proves that the DGAP is a discrete time stochastic approximation process.

###### Proposition 2.1

The iterative process is such that for all .
It can be written as

 xn+1=xn+1n+1(F(xn)+Un+1+ξn+1), (1)

where

• with ,

• is a bounded martingale difference (i.e ),

• .

All our proofs are in the appendix.

The iterative process (1) is a discrete time stochastic process with step .666Note that and . It is important that the sum diverges, to guarantee that the process does not get ”stuck” anywhere, unless agents want to stay where they are. Further, it is important that the terms go to zero, so that the process can ”settle” when agents want to. In fact, the term can be replaced by any step of the form , where and , without affecting the results.. If there were no stochastic term, the process (1) would write

 xn+1=xn+1n+1F(xn),

which corresponds to the well-known Euler method, a numerical procedure for approximating the solutions of the deterministic ordinary differential equation (ODE)

 ˙x=F(x). (2)

Although the (stochastic) process (1) differs from the (deterministic) process (2) because of the random noise, the asymptotic behavior of (2) will inform us on the asymptotic behavior of (1).777Stochastic approximation theory (see Benaïm (1996) or Benaïm (1999) for instance) tells us that, as periods unfold, the random process gets arbitrarily close to the solution curve of its underlying dynamical system. In other words, given a time horizon - however large it might be - the process shadows the trajectory of some solution curve between times and with arbitrary accuracy, provided is large enough.

###### Remark 1

In the standard gradient method (see for instance Arrow and Hurwicz (1960)), the dynamical system is defined as where

 Hi(x)={∂ui∂xi(x) unless xi=0 and ∂ui∂xi(x)<0,0 otherwise. (3)

The function is thus discontinuous, since the process could otherwise leave the admissible space. Conversely, the dynamical system underlying the DGAP is continuous: . The role played by the multiplicative factor is to dampen the variations of the state variable and ensure that it will never reach the boundary - although it can converge to it. This is not unreasonable behavior: the gradient system assumes that players crash onto the boundary, whereas we assume that the closer they get to the boundary, the smaller their movements become.

In Rosen (1965), the author studies another gradient method also ensuring that the system never leaves the state space. The system is given by

 Hi(x)=ri∂ui∂xi(x)+k∑j=1λj∂hj∂xi(x), (4)

where the functions are the constraints defining the convex and compact set where lives (i.e. ), and are appropriately chosen weights guaranteeing that the system will always remain within the set .

### 2.3 Limit sets

The focus of this paper is on the asymptotic behavior of the random process . Hence we are interested in its limit set888In the remainder of the paper, we will always place ourselves on the event , i.e. we will abstract from the possible realizations which take the process to infinity..

###### Definition 2.2 (Limit set of (xn)n)

Given a realization of the random process, we denote the limit set of by

 L((xn)n):={x∈X;∃ a subsequence xnk such that;xnk→x}. (5)

Note that the limit set of the learning process is a random object, because the asymptotic behavior of the sequence depends on the realization of the random sequence , drawn at every exploration stage.

Proposition 2.1 allows us to make use of stochastic approximation theory, which provides a characterization of the candidates for 999In Benaïm (1999), it is established that on , the limit set of is always compact, invariant and attractor-free. This class of sets is called internally chain transitive (ICT). These sets can take very complicated forms, but they conveniently include the zeroes of and the -limit set of any point (if non-empty).. In particular the -limit sets of 101010Let denote the flow of , i.e. the position of the solution of (2) with initial condition , at time . Then, the -limit set of is given by Notice that by the regularity assumption on , satisfies the Cauchy-Lipschitz condition that guarantees that, for all , is well-defined and unique. We consider the restriction of on , since is invariant for its flow, and our random process (1) always remains in the positive orthant. lie among these candidates. However, several difficulties remain: first, there might be other candidates that are not -limit sets of the underlying ODE. Moreover, this theory does not provide general criteria to systematically exclude any of these candidates, nor to confirm that they are indeed equal to .

The stationary points of the dynamical system (2) are particular -limit sets that will be of interest to us, as they contain all the Nash equilibria of the underlying game. The set of stationary points, denoted , will be called the zeros of : . For convenience, we drop the reference to and simply write .

Observe that . Thus, , while . This implies that all the Nash equilibria of the game are included in the set of zeros of . Unfortunately, contains more than the set of Nash equilibria. We call an other zero () of the dynamical system:

 OZ={x:F(x)=0 and ∃i s.t.xi=0,∂ui∂xi(x)>0}.

We have the following partition of F:

 Z=NE∪OZ. (6)

Note that might contain some points in , however .

Convergence or non-convergence of our random process to a given point or set will sometimes depend on the stability of the latter with respect to the deterministic dynamical system . In different sections we use various notions of stability, which we recall here.

Let . The point is asymptotically stable (denoted by ) if it uniformly attracts an open neighborhood of itself: , where denotes the flow of . The point is linearly stable (denoted by ) if for any - where is the Jacobian matrix of evaluated at and is the spectrum of matrix - we have and is linearly unstable (denoted by ) if there exists such that . Note that if is hyperbolic (that is for any ) then it is either linearly stable or linearly unstable. We denote the set by and by a slight abuse of language, we will call all points in stable.

We have the following inclusions:

 ZLS⊂ZAS⊂ZS.
###### Proposition 2.2

We have . As a consequence, .

To prove this, we take in and pick an individual such that and . We then show that

is an eigenvalue of

. The direct consequence of Proposition 2.2 is that if the limit set contains stable stationary points, they must be stable Nash equilibria.

In view of Proposition 2.2, we will use the following notations in the remainder: , and .

As mentioned earlier, we will sometimes be dealing with connected components of instead of isolated points. We will thus use the concept of attractor (see Ruelle (1981)):

###### Definition 2.3

Let be invariant for the flow . Then a set is an attractor for if
is compact and invariant;
there exists an open neighborhood of with the following property:

 ∀ϵ>0,∃T>0 such that ∀x∈U,∀t≥T,d(φ(x,t),A)<ϵ.

An attractor for a dynamical system is a set with strong properties: it uniformly attracts a neighborhood of itself.

###### Remark 2

Let be an isolated stationary point of . Then is asymptotically stable if and only if is an attractor for .

We turn to the analysis of several classes of games.

## 3 Strategic complements

###### Definition 3.1

A game is a game with strategic complements if payoff functions are such that for all .

Games with strategic complements have nice structured sets of Nash equilibria (Vives (1990)), and offer nice convergence properties for specific dynamical systems. However, it can be difficult to obtain convergence to Nash for general learning procedures. There are several reasons for this that we illustrate here through two examples.

First, consider the Best-Response dynamics. Under Hypothesis 1, best-response functions are differentiable and strictly increasing. In that case, Vives (1990) proves in Theorem 5.1 and Remark 5.2 that, except for a specific set of initial conditions, the Best-Response dynamics, whether in discrete or in continuous time, monotonically converges to an equilibrium point. Unfortunately, in our case this set of problematic initial conditions cannot be excluded, in particular because the process is stochastic. It could be that the stochastic process often passes through these points, in which case it is known to possibly converge to very complicated sets111111See Hirsch (1999).. In order to study convergence of the DGAP, we thus need to consider all possible trajectories and cannot rely on existing results.

Second, consider the standard reinforcement learning stochastic process, whose mean dynamics are the replicator dynamics. As shown in Posch (1997), the process can converge with positive probability to stationary points that are not only unstable, but also non-Nash. Examples can be constructed with players, each having strategies, supermodular payoff matrices with a unique strict Nash equilibrium, which is, moreover, found by elimination of dominated strategies. Yet even then, the learning process converges with positive probability to any other combination of strategies. This happens because there are some stationary points of the dynamics where the noise generated by the random process is null.

These two examples illustrate how, despite the games’ appealing properties, convergence to Nash is neither guaranteed nor easy to show when it occurs. We show that the DGAP will converge. In order to get our result, we first need to prove that no point in , the boundary of the state space, will be included in the limit set of the process. We start by imposing a simple and natural hypothesis.

###### Hypothesis 3

For any agent ,

 ∂ui∂xi(0,0)>0.

Hypothesis 3 guarantees that players want to move away from the origin. Because of strategic complementarities, this also implies that players want to move away from any point of (since ). However, despite the fact that all players prefer to move away from the boundary, it is not clear why the stochastic process should remain at a distance from this boundary. The difficulty comes from the following fact: assume players start close to the boundary. Then, at the exploration stage, some decrease their efforts while others increase theirs. Although complementarities imply that the players who decreased their efforts would have been better-off if they had instead increased them, they could still end up with a better payoff than before the exploration, and thus continue decreasing at the updating stage, getting closer to .

The following proposition proves that this will not happen in the long run.

###### Proposition 3.1

Under Hypothesis 3, there exists such that almost surely.

From the mathematical point of view, the major problem to obtain Proposition 3.1 is to show that a stochastic approximation algorithm like the one given by (1) is pushed away from an invariant set for where the noise term vanishes. In fact, there is almost no general result along these lines in the literature.

The proof of Proposition 3.1 is long and technical, but the idea goes as follows: among the players close to the boundary, the player exerting the least effort will increase his effort on average. Unfortunately, this does not imply that the smallest effort also increases, since another player may have decreased his. We thus construct a stochastic process which is a suitable approximation of the smallest effort over time. We then show that this new process cannot get close to the boundary, and because it is close asymptotically to our process, we are able to conclude.

###### Definition 3.2

The interaction graph is said to be bipartite at if the set of players can be partitioned into and such that for any pair of players and we have

 gij(x)=1⟹(i∈N1 and j∈N2) % or (i∈N2 and j∈N1).

An interaction graph is non-bipartite on a set if for all , is not bipartite.

We are now ready to state the main result of this section.

###### Theorem 3.1

Consider a game of strategic complements and smooth payoff functions, and assume that Hypothesis 3 holds. Then

• The learning process cannot converge to an unstable Nash equilibrium:

 ∀~x∈NELU,P(limnxn=~x)=0.
• If, in addition, the interaction graph is non-bipartite on , the learning process almost surely converges to a stable Nash equilibrium:

 P(∃x∗∈NES:limnxn=x∗)=1.

This result is very tight. Because the hypotheses of the theorem are verified for most common economic models we can think of, this theorem guarantees that the learning process will not only converge to Nash in most cases, it will additionally converge to a stable equilibrium. In cases where the interaction graph is bipartite, we cannot guarantee that the process will not converge to general unstable sets121212Linearly unstable equilibria are unstable sets, but unstable sets also include much more complex structures.. However, we can still exclude convergence to linearly unstable equilibria by point .

Let us provide some insights on the bipartiteness condition. As in Posch (1997), one potential issue is that the random process could get stuck around stationary points of the underlying dynamics if the random noise is zero at these stationary points. More precisely, a stationary point is unstable if there is some direction along which the system ”escapes” the stationary point. But the system has to be able to follow that direction, otherwise it will get stuck. The random process plays precisely this role here: it allows the system to escape, as long as the unstable direction component of the random noise is not zero at that point. At an unstable equilibrium, we can show that the noise is not zero in the unstable direction and this guarantees the non-convergence result of part . The non-bipartiteness of the network guarantees that the noise has the property of being uniformly exciting everywhere in , which guarantees that the process can escape in any direction. This yields part . When the network is bipartite, this property does not hold and we cannot guarantee that the process will not get stuck in an unstable set.

Note that the bipartiteness condition does not imply that the process will not converge to an element of . However, we provide two examples in the appendix (Examples E.1 and E.2) in which we show that the noise can vanish on bipartite networks in games that have either no strategic complements or no symmetric externalities. In our examples the noise vanishes at unstable equilibria.

## 4 Locally ordinal potential games

We introduce a class of games that we call the locally ordinal potential games. Recall that a game is a potential game () if there is a function such that for all , for all , we have , and an ordinal potential game () if .

###### Definition 4.1

A game is a locally ordinal potential game () if there is a differentiable function such that

 sgn(∂ui∂xi(x))=sgn(∂P∂xi(x))

The class of is large, in the sense that when is differentiable. It also contains many games of economic interest. For instance, both examples 1 and 2 are locally ordinal potential games.

The generality of our results depends on the structure of the set of stationary points of the game under consideration, and in particular on whether it consists of isolated points or not. For instance, the public good game of example 1 generically has a finite number of isolated zeros, but can have continua of equilibria for certain values of the substitutability parameter131313See Bervoets and Faure (2018) for more details..

###### Theorem 4.1

Let be an and be sufficiently regular. Then

• If has isolated zeros, then

 P(∃x∗∈NE:limnxn=x∗)=1.

If, in addition, the interaction graph is non-bipartite on , then

 P(∃x∗∈NES:limnxn=x∗)=1.

For any , the only set to which the stochastic learning process can converge is the set of zeros of . Complex -limit sets of the dynamical system, which are non-zeros, can be discarded. We cannot, however, be sure that the process will not reach a set containing other zeros, thus we cannot guarantee convergence to the set of Nash equilibria. When zeros are isolated, however, convergence to Nash is proved by the conjunction of the first point and the fact that the process cannot converge to an isolated other zero. Furthermore, we prove that the DGAP cannot converge to a linearly unstable Nash if is non-bipartite (on this, we provide more details in Section 5).

When zeros are non-isolated, we cannot guarantee that the DGAP will converge to a stable set. We can use Benaïm (1999) to show that on the event , for any attractor of the ODE (2), where is the basin of attraction of . Combining this observation with point of Theorem 4.1, we get the following important implication: if a connected set is an attractor for , then is a connected component of .

However, when focusing on s, more can be said, since we are able to relate attractors of the dynamics to the potential function , and to another dynamical system, extensively used in economics: Best-Response Dynamics (BRD).

###### Definition 4.2

Let . The continuous-time Best-Response dynamics (thereafter, BRD) is defined as:

 ˙x=−x+BR(x) (7)
###### Definition 4.3

Let be a smooth map and be a connected component of , we say that is a local maximum of if

• is constant on : ;

• there exists an open neighborhood of such that

We then have

###### Theorem 4.2

Assume is an and let be a connected set. Then the following statements are equivalent

• is an attractor for

• is a local maximum of

• and is an attractor for the best-response dynamics .

This result is positive and informative. First, it tells us that attractors are necessarily included in the set of Nash equilibria. Thus, although the process might converge to other zeros when stationary points are non-isolated, these points are unstable.

Second, Theorem 4.2 provides two methods of finding the attractors: one way is to look for local maxima of the potential function, which is very convenient when the function is known; and the other is to look for attractors for another dynamics, possibly simpler to analyze, the BRD. Note that this second method establishes a relation between two dynamics that are conceptually unrelated. Indeed, the BRD assumes that agents are very sophisticated, as they know their exact payoff function, they observe their opponents’ play and perform potentially complex computations. Solution curves may be very different, but surprisingly, both dynamics share the same set of attractors.

## 5 Isolated zeros

In the two previous sections we did not assume any specific structure on the set of zeros of the dynamical system. However, in most economics games with continuous action spaces, the set of zeros, and in particular the set of Nash equilibria, would be finite. In that case, zeros are isolated points. For instance, in the public good game of example 1, Bramoullé et al. (2014) show that the game has a finite number of equilibria for almost every value of substitutability between efforts. The same can be said about the games in example 2. In this section, we restrict our attention to these games.

###### Remark 3

If , then

 P(limnxn=^x)>0

on the event .

This is just a consequence of the result in Benaïm (1999) mentioned earlier and the fact that is an attractor. It says that the process can converge to desirable outcomes. We next turn to the hard part, i.e. excluding the convergence to undesirable zeros in every game with isolated zeros.

### 5.1 Non convergence to undesirable zeros

In games with continuum of equilibria, we cannot exclude the possibility of our learning process getting arbitrarily close to elements of the set of other zeros. More precisely, there is no a priori reason to believe that the learning process will converge (to a point) when zeros of the dynamical system are connected components. If it does not, then the process could come arbitrarily close to a continuum of that is connected to a continuum of , and oscillate between the two. However, when zeros are isolated this cannot happen and we can discard convergence to other zeros. Further, we can almost always discard convergence to linearly unstable Nash equilibria.

###### Theorem 5.1

Let be a game with isolated zeros and assume that . Then:

• If , then

• If , , and is non-bipartite, then

The proof of point a) is a probabilistic proof. We show that in , the players who are playing although they have a strictly positive gradient will, in expectation, increase their action level as they approach the boundary. This is of course a contradiction.

### 5.2 Concave games

As mentioned in Remark 1, Arrow and Hurwicz (1960) and Rosen (1965) analyzed similar dynamical systems in concave games. The first investigates a subclass of all games with payoff functions that are concave in players’ own actions and convex in other players’ actions. These games include the well-known class of zero-sum games. The authors then prove global convergence of system (3).

Rosen (1965) deals with concave games, and provides sufficient conditions for the game to have a unique Nash equilibrium when the strategy space is compact and convex: if there are some positive weights such that the weighted sum of the payoff functions is diagonally strictly concave, then the equilibrium of the game is unique. Under that assumption, the author proves that the weighted gradient system (4) globally converges to this unique equilibrium.

We are interested in determining whether the DGAP also converges in these games, but this raises several problems. First, we need to show that our deterministic system (2) has the same good convergence properties as (3) and (4). But this is not enough, since our process is stochastic, unlike theirs. Second therefore, we need to show that the limit set of the stochastic process (1) is included in the set of stationary points of the dynamical system (2) for these games. Last, the games considered in Arrow and Hurwicz (1960) sometimes have continua of equilibria. For instance, in zero-sum games, the set of equilibria is known to be convex. To avoid this issue, we maintain the concavity condition on the payoff functions but we require that at least one player’s payoff function is strictly concave in own action. Under this assumption, we show that these games satisfy Rosen’s (1965) condition - and thus have a unique Nash equilibrium. We next show that all games satisfying Rosen’s condition have isolated zeros for our dynamical system. With this in hand, we prove that the DGAP converges to the unique Nash equilibrium with probability .

Suppose that is concave in for every . Following Rosen (1965), given and , let be given by 141414 The dynamical system is a weighted gradient system, and is significantly different from the system (2). A game is diagonally strictly concave if

 ∃r∈(R∗+)N∣∀x0≠x1∈X we have ⟨x1−x0∣g(x0,r)⟩+⟨x0−x1∣g(x1,r)⟩>0 (8)

Games having this property are denoted by . It is proved (Theorem 2 of Rosen (1965)) that games in have a unique Nash equilibrium when the state space is compact. In our context, where the state space is unbounded, they may have none.

Games considered by Arrow and Hurwicz (1960) (which we call concave-convex games, and denote by ) are as follow. Let be a subset of , the set of players, and define . A game is concave-convex if for each , the function is concave in for each and convex in for each , and for some , is strictly concave in for each . If in addition is strictly concave in , then we say that the game is strictly concave-convex.

###### Remark 4

Stricly concave-convex games are diagonally strictly concave, i.e. . Thus all properties of the later apply to the former.

For simplicity, in the remainder of this section we will place ourselves in the setting of Rosen (1965), i.e. we assume that the strategy space is a compact set. This guarantees that the Nash equilibrium is unique. When the set is unbounded, the game could have no equilibrium at all and if that happened, the process would go to infinity. Because this introduces unnecessary complexities in the proof, we restrict our attention to compact sets.

The fact that the Nash equilibrium is unique is convenient for the study of dynamics where the Nash equilibria are the only stationary points. However, the system (2) also has other zeros, since . In the following theorem, we show that there is a finite number of other zeros, and thus all the stationary points are isolated. We also state our convergence result.

###### Theorem 5.2

Let . Then,

• is a finite set

• There is a unique Nash equilibrium and

 P(limnxn=¯¯¯x)=1.

The proof of the first point goes as follows: we prove that games in are such that, after removing a subset of players playing , the remaining subgame is also in . Thus there is at most one Nash equilibrium for any combination of agents playing . The number of such potential combinations is finite, so the result follows.

In order to prove the second point of Theorem 5.2, we show that the zeros of (2) are the only candidates for limit points of our process. We cannot do this in general games with isolated zeros, but in diagonally strictly concave games we can, by decomposing the state space into several subspaces (respectively, the interior of the space and every face) and constructing appropriate Lyapunov functions for each subspace. As a consequence, we prove that every solution of (2) converges to one of the zeros. Since zeros are the only candidates, we get the desired conclusion by using point i) of Theorem 5.1.

## Appendix A Proof of results of Section 2

#### Proof of Proposition 2.1.

We first prove that the process can be written as in equation (1). Second we prove that the process is well-defined, i.e. for all and all .

1- We have, for any ,

 ei2n+2−ei2n=ei2nϵinΔuin+1

A first order development gives

 ϵinΔuin+1 = ϵin(ui(ei2n+1n+1ϵin,e−i2n+1n+1ϵ−in)−ui(ei2n,e−i2n)) = 1n+1(ϵin)2∂ui∂xi(e2n)+1n+1ϵin∑j≠iϵjn∂ui∂xj(e2n)+O(1n2)

Because and , we have

 xin+1−xin = 1n+1xin∂ui∂xi(xn)+1n+1ϵinxin∑j≠iϵjn∂ui∂xj(xn)+O(1n2)

By setting , we get equation (1). Finally, note that for all , and that and are independent, so that

 E(Un+1∣Fn)=0.■

2- Let us now show that the process is well-defined. Notice that Hypothesis 1 implies that is bounded everywhere. For simplicity and without loss of generality, we will assume that . This is just for simplicity, the proof can easily be accommodated otherwise. Let . By assumption, for all . Thus,

 xin+1xin≥(1−∥e2n+1−xn∥∞),

and for all . As a consequence,

 xin+1xin≥(1−1n+1).

Thus, and

 xin≥xi1n−1∏k=1(1−1k+1)=1n+1xi1≥0.

Note that at the beginning of the process, steps are large. Thus in case is close to , the exploration phase might take players to the negative orthant (). This can only happen because the first steps are large. In order to avoid that, we can either assume that (i.e. players start far enough from the boundary), or that the process begins at step , where is the integer part of (i.e. the first steps are not too large). In any case, this is totally innocuous for what we do and guarantees that .

#### Proof of Proposition 2.2.

Pick and assume without loss of generality that with . Then

 ∂F1∂x1(^x)=∂u1∂x1(^x), and ∂F1∂xj(^x)=0 for j≠1.

Hence

, and the associated eigenvector is

which points inwards (i.e. ). Thus, necessarily .

Next, is a consequence of , and .

## Appendix B Proof of results of Section 3

### b.1 Proof of Proposition 3.1

Under Assumption 3, for any , there exists such that

 ∂ui∂xi(xi,0)>αi>0,∀xi≤¯¯¯xi.

Since the game has strategic complements,

 ∂ui∂xi(xi,x−i)>αi>0,∀xi<¯¯¯xi,∀