Continuous-time Discounted Mirror-Descent Dynamics in Monotone Concave Games

12/07/2019 ∙ by Bolin Gao, et al. ∙ 0

In this paper, we consider concave continuous-kernel games characterized by monotonicity properties and propose discounted mirror descent-type dynamics. We introduce two classes of dynamics whereby the associated mirror map is constructed based on a strongly convex or a Legendre regularizer. Depending on the properties of the regularizer we show that these new dynamics can converge asymptotically in concave games with monotone (negative) pseudo-gradient. Furthermore, we show that when the regularizer enjoys strong convexity, the resulting dynamics can converge even in games with hypo-monotone (negative) pseudo-gradient, which corresponds to a shortage of monotonicity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One of the earliest works on solving continuous-kernel concave games is the work of Rosen [29]. The continuous-time gradient type dynamics was shown to converge to the Nash equilibrium in games that satisfy a so-called diagonally strictly concave condition, roughly equivalent to the pseudo-gradient being a strictly monotone operator. Recently, research on solving monotone games has seen a surge. Both continuous-time dynamics and discrete-time algorithms have been developed, mostly for games with strictly (strongly) monotone pseudo-gradient. For (non-strictly) monotone only games, no continuous-time dynamics exist. Discrete-time algorithms have been proposed, based on either proximal regularization, [28], inexact proximal best-response, [17] or Tikhonov type regularization, [19], and recently extended to generalized Nash equilibrium, e.g. [20], [21]. All these works are done in a discrete-time setting and the dynamics evolve in the primal space of decision variables (and possibly multipliers). With the exception of [19], these algorithms are applicable only in games with “cheap” (inexpensive) proximal/resolvent evaluation, [28].

In this note we propose a family of continuous-time discounted mirror descent dynamics, whereby the dynamics evolves in the space of dual (pseudo-gradient) variables. The mapping from the dual back to the primal space of decision variables is done via a mirror map, constructed based on two general classes of regularizers. Depending on the properties of the regularizer, we show that these dynamics can converge asymptotically in merely monotone, and even hypo-monotone, concave games. To the best of our knowledge, these are the first such dynamics in the literature. Our novel contributions consist in relating the convergence of the dynamics to the properties of the convex conjugate of the regularizer.

Literature review: Mirror descent algorithms have found numerous applications in recent years, e.g. in distributed optimization [2], online learning [4], and variational inequality problems [5]. They fall into the class of so-called primal-dual algorithms; the name mirror descent refers to the two iterative steps: a mapping of the primal variable into a dual space (in the sense of convex conjugate), followed by a mapping of the dual variable, or some post-processing of it, back into the primal space via a mirror map. The mirror descent algorithm (MDA) introduced by Nemrovskii and Yudin[1], was originally proposed as a generalization of projected gradient descent (PGD) for constrained optimization. The authors of [31] have shown that MDA possesses better rate of convergence as compared to the PGD, which makes it especially suitable for large-scale optimization problems. Other types of algorithms can be seen as equivalent to or special cases of MDA, e.g. dual averaging [3] and follow-the-leader [4]. A continuous-time version of MDA, referred to as the mirror descent (MD) dynamics, [15, 12], captures many existing continuous-time dynamics as special cases, such as the gradient flow [15, 12], saddle-point dynamics [9] and pseudo-gradient dynamics [10].

In the context of multi-agent games, mirror descent-like algorithms have been applied to continuous-kernel games [13], finite-games [6, 7]

, and population games. The primal space is taken to be the space of decisions/strategies, and the dual space is the space of payoff vectors (in finite games) or pseudo-gradient vectors (in continuous-kernel games). Zhou et al.

[13] introduced the concept of variationally stable concave game and showed that, under variational stability, the iterates of an online MDA converge to the set of Nash equilibria, whenever the step-size is slowly vanishing step-size sequence and that the mirror map satisfies a Fenchel coupling conforming condition [13]. Since all concave games with strictly monotone pseudo-gradient are variationally stable concave games, therefore the algorithm converges in all strictly monotone games. However, there are games with a (unique) Nash equilibrium that is not necessarily variationally stable, e.g. zero-sum (monotone) games. While finding the Nash equilibrium of strictly monotone games is an important problem, convergence in such games does not necessarily imply convergence in monotone (but not strictly monotone) games.

Contributions: Motivated by the above, in this work we propose two classes of continuos-time discounted MD dynamics for concave, continuous-kernel games. The discounting is performed on the dual step of the mirror descent, which generates a weighted-aggregation effect similar to the dynamics studied for finite-action games in [7]. Discounting is known to foster convergence and eliminate cycling in games, as shown in monotone games or zero-sum games [7, 24]. By exploiting properties of the mirror map in the two classes as well as the discounting effect, we show that these dynamics converge asymptotically to the perturbed equilibria of concave games with monotone (not necessarily strictly monotone) pseudo-gradient. Under certain conditions, they can even converge in concave games with hypo-monotone pseudo-gradient. To the best of our knowledge, these are the first such results. Our convergence analysis uses a Lyapunov function given by a Bregman divergence. We note that recently [32] identified the Bregman divergence as a natural Lyapunov candidate for a variety of systems, elegantly tying with existing results on mirror descent dynamics [12]. While the dynamics are in the dual space as in [7], herein we consider continuous-kernel games rather than finite-action games. Furthermore, compared to [7] we set up a general framework in terms of two classes of regularizers, matched to the geometry of the action set. For either strongly convex or Legendre regularizers, we provide convergence guarantees in monotone (hypo-monotone) games and present several example discounted MD dynamics. In fact, one such example recovers the dynamics in [7] if the action set is specialized to a simplex geometry and the regularizer taken as a particular entropy example. Another example dynamics can be seen as the continuous-time dual counterpart to the discrete-time Tikhonov (primal) regularization, [19]. Compared to the undiscounted MD [13], our discounted MD dynamics can converge in (not strictly) monotone games, and even in hypo-monotone games. A short version will appear in [39], with two example dynamics. Here we propose two general classes, present proofs (omitted from [39]), additional example dynamics and numerical results.

The paper is organized as follows. In Section II, we provide preliminary background. Section III presents the problem setup and introduces a general form of the discounted mirror descent (DMD) dynamics. In Section IV, we construct two classes of DMD and prove their convergence. In Section V, we construct several examples of DMD from each class. We present numerical results in Section VI and conclusions in Section VII.

2 Background

2.1 Convex Sets, Fenchel Duality and Monotone Operators

The following is from [23, 28, 25]. Given a convex set , the (relative) interior of the set is denoted as () . coincides with whenever is non-empty. The closure of is denoted as , and the relative boundary of is defined as . The indicator function over is denoted by . The normal cone of is defined as and is the Euclidean projection of onto .

Let be endowed with norm and inner product . An extended real-valued function is a function that maps from to . The (effective) domain of is . A function is proper if it does not attain the value and there exists at least one such that ; it is closed if its epigraph is closed. A function is supercoercive if . Let denote a subgradient of at and the gradient of at , if is differentiable. Suppose is a closed convex proper on with , then is essentially smooth if is differentiable on and whenever is a sequence in converging towards a boundary point. is essentially strictly convex if is strictly convex on every convex subset of . A function is Legendre if it is both essentially smooth and essentially strictly convex. Given , the function defined by , is called the conjugate function of , where is the dual space of , endowed with the dual norm . is closed and convex if is proper. By Fenchel’s inequality, for any , , (with equality if and only if for proper and convex , or if in addition is closed [23, Theorem 4.20]). The Bregman divergence of a proper, closed, convex function , differentiable over , is . is monotone if , . is -Lipschitz if , for some and is -cocoercive if for some .

2.2 -Player Concave Games

Let be a game, where is the set of players, is the set of player ’s strategies (actions). We denote the strategy (action) set of player ’s opponents as . We denote the set of all the players strategies as . We refer to as player ’s real-valued payoff function, where is the action profile of all players, and is the action of player . We also denote as where is the action profile of all players except .

Assumption 1.

For all ,

  • is a non-empty, closed, convex, subset of ,

  • is (jointly) continuous in ,

  • is concave and continuously differentiable in each for all .

Under 1, we refer to as a concave game. Equivalently, in terms of a cost function , the game is a convex game. For the rest of the paper, we use the payoff function throughout. Given , each agent aims to find the solution of the following optimization problem,

subject to (1)

A profile is a Nash equilibrium if,

(2)

At a Nash equilibrium, no player can increase his payoff by unilateral deviation. If is bounded, under Assumption 1, existence of a Nash equilibrium is guaranteed (cf., e.g. [27, Theorem 4.4]). When is closed but not bounded, existence of a Nash equilibrium is guaranteed under the additional assumption that is coercive in , that is, for all , (cf. [27, Corollary 4.2]). A useful characterization of a Nash equilibrium of a concave game is given in terms of the pseudo-gradient defined as , where is the partial-gradient. By [28, Proposition 1.4.2], is a Nash equilibrium if and only if,

(3)

Equivalently is a solution of the variational inequality , [28], or, using the definition of the normal cone,

(4)

Standard assumptions on the pseudo-gradient are as follows.

Assumption 2.

is

  • monotone,

  • strictly monotone,

  • -strongly monotone, , , for some .

  • -hypo monotone, , , for some .

We refer to as a monotone game if it satisfies 2(i).

3 Problem Setup

We consider a set of players who are repeatedly interacting in a concave game . Assume that the game repeats with an infinitesimal time-step between each stage, hence we model it as a continuous-time process as in [16], [10]. Each player maps his own partial-gradient into an auxiliary variable via a dynamical system and selects the next action via a so-called mirror map . The entire learning process for each player can be written as a continuous-time dynamical system,

(5)

where . We assume that the mirror map is given by,

(6)

where is assumed to be a closed, proper and (at-least) essentially strictly convex function, where is assumed be a non-empty, closed and convex set. The function is often referred to as a regularizer in optimization, learning and game contexts. Different forms of mirror map can be derived depending on the regularizer. Finally, since the pseudo-gradient is not assumed to be bounded, should be chosen so that the dual space is unconstrained.

The most important family of algorithms that follows the model of the learning dynamics (5) is that of mirror descent (MD) dynamics,

(7)

where is a rate parameter. This can be interpreted as each player performing an aggregation of its own partial-gradient, , and mapping it to an action via the mirror map . The discrete-time analog of (7),

(8)

with the step-size, is the online mirror descent studied in [13] in a similar concave game setup. In finite games, this algorithm is referred to as Follow-the-Regularized-Leader (FTRL)[24].

Remark 1.

As an example, let , , so, cf. (6), . The dual (MD) dynamics (7) is,

(9)

which is in turn equivalent to the well-known primal dynamics,

(10)

or the pseudo-gradient dynamics (PSGD), known to converge to the NE when is strictly/strongly monotone (e.g. Lemma 2, [22]).

In this paper we propose a related variant of the MD dynamics (7), called the discounted mirror descent dynamics DMD, given by,

(11)

where , and . Unlike the undiscounted MD (7), in (11) each player performs an exponentially discounted aggregation. The DMD dynamics of all players can be written in stacked notation as,

(12)

with , .

Our focus in this paper is to construct classes of DMD dynamics (11) for different types of the regularizer , (6). We investigate the convergence of these classes of dynamics in monotone (not necessarily strictly monotone) games, based on the properties of the associated mirror map , (6). We then construct several examples of DMD dynamics from each class.

4 A General Framework for Designing Discounted Mirror Descent Dynamics

In this section, we consider two general classes of regularizers and study properties of the associated mirror maps (proofs are given in the Appendix). Based on these, we investigate the convergence of DMD (11), under different assumptions on the game’s pseudo-gradient.

4.1 Properties of Induced Mirror Maps

We consider convex regularizers that can be classified as either steep or non-steep according to the following definition.

Definition 1.

A closed, proper, convex regularizer is said to be steep (or relatively essentially smooth) if,

  • is non-empty and convex,

  • is differentiable on ,

  • , whenever is a sequence in converging to a point in .

is non-steep if is bounded, for any sequence in converging to a point in .

Remark 2.

A non-empty, convex domain ensures the non-emptiness of its relative interior [25, Theorem 6.2, p. 45].

Proposition 1.

Let be a closed, proper, convex. Then, the following hold: (i) If is steep, then and .

(ii) If is non-steep, then and .

Assumption 3.

The regularizer is closed, proper, convex, with non-empty, closed and convex. In addition,

  • is -strongly convex, or

  • is Legendre and .

Note that 3(ii) relaxes strong convexity to essential strict convexity and essential smoothness (steepness). In order to take into consideration in the regularization, cf. (6), we consider , which inherits all properties of . We then refer to as the mirror map induced by . Next, we derive properties of for the two classes of regularizers cf. 3(i) and 3(ii).

Proposition 2.

Let , , where satisfies 3(i), and let be the convex conjugate of . Then,

  1. is closed, proper, convex and finite-valued over , i.e., .

  2. is continuously differentiable on and .

  3. is -Lipschitz on .

  4. is -cocoercive on , and in particular, is monotone.

  5. is surjective from onto whenever is steep, and onto whenever is non-steep.

  6. has as a left-inverse over whenever is steep, and over whenever is non-steep.

Remark 3.

If is differentiable over all , following [33, Theorem 6.2.4(b), p. 264], 2 strengthens as follows: (i) is closed, proper, strictly convex and finite-valued over , (ii) is strictly monotone on , (iii) is bijective from to , (iv) has a full inverse over . For example, , (PSGD) is such a case.

Proposition 3.

Let , , where satisfies 3(ii), and let be the convex conjugate of . Then,

  • is closed, proper, Legendre and finite-valued over , i.e., .

  • is a homeomorphism with inverse mapping

  • is strictly monotone on .

3 follows from Legendre theorem [25, Thm 26.5, p.258].

Next, we provide a fixed-point characterization of the mirror map (4), which will be used to relate equilibria of (12) to Nash equilibria of the game (5).

Proposition 4.

Let , , where satisfies 3. Then, the mirror map induced by , , (6), can be written as the fixed point of the Bregman projection,

(13)

where is the Bregman divergence of ,

We show next that any rest point of DMD (11) or (12) is the Nash equilibrium associated with a perturbed payoff. Any equilibrium point of the closed-loop system (12) is characterized by,

(14)

i.e., , . From (6), by Berge’s maximum theorem, is compact valued and upper semicontinuous. Since is jointly continuous, is also compact and upper semicontinuous, and by Kakutani’s fixed-point theorem, admits a fixed point.

Proposition 5.

Let , , where satisfies 3 and the induced mirror map. Any rest point of DMD (11) is the Nash equilibrium of the game with perturbed payoff,

(15)

As , , where is a Nash equilibrium of .

Proof.

From the fixed-point characterization of the mirror map (13) (cf. 4), evaluated at , one can write ,

where is the indicator function over . By Fermat’s condition for unconstrained optimization [30, Prop 27.1, p. 497], is a minimizer if and only if,

(16)

or , where [30] was used. By 2(ii) or 3(ii), , and has as a left-inverse (cf. 2(vi) or 3(ii)), therefore, . Substituting this and yields for any , ,

In stacked form, with , this is written as

(17)

or . By (4), is a Nash equilibrium for the perturbed payoff . As , (17) yields (4), hence . ∎

Remark 4.

If is monotone, then is strictly monotone, hence a unique perturbed NE exists for each .

4.2 Convergence of DMD under Induced Mirror Maps

Using key properties given by 2 and 3, for regularizers satisfying either 3(i) or 3(ii), in 1 and 2 we show convergence of DMD under corresponding induced mirror maps in the two cases, respectively.

Theorem 1.

Let be a concave game with players’ dynamics given by DMD (11). Assume there are a finite number of isolated fixed-points of , where is the mirror map induced by satisfying 3(i). Then, under either 2(i), (ii), or (iii), with the additional assumption that is coercive in whenever is non-compact, for any , the auxiliary variables converge to a rest point while players’ actions converge to , a perturbed Nash equilibrium of . Alternatively, under 2(iv), the same conclusions hold for any .

Proof.

Let be a rest point of (12), . Take as Lyapunov function the sum of Bregman divergences of , ,

(18)

Since is convex (cf. 2(i)), it follows that is positive semidefinite. When is compact, since is continuous, , for some . Then from (11), , and Hence is nonempty, compact, positively invariant set. Alternatively, when is non-compact, for any , is coercive [26, Prop. 1.3.9(i)], hence is coercive and can be any of its sublevel sets. Along any solution of (11), . Using 2(ii),

(19)

where and , cf.(14) was used. Since , under 2(i), 2(ii), or 2(iii) the first term of is non-positive, therefore, , where we used the fact that is -cocoercive (cf. 2(iv)). This implies that and only if . We find the largest invariant set contained in for On , hence since , as , for any . Thus, no other solution except can stay forever in , and consists only of equilibria. Since by assumption there are a finite number of isolated equilibria, by LaSalle’s invariance principle, [11], it follows that for any , converges to one of them, . Finally, since is -Lipschitz (cf. 2(iii)), , hence as , where, by 5, is a perturbed Nash equilibrium.

Alternatively, under 2(iv), following from (19),

where we again used the -cocoercivity of . Assuming that , then , and convergence follows as before. ∎

Theorem 2.

Let be a concave game with players’ dynamics given by DMD (11). Assume there are a finite number of isolated fixed-points of , where