Probably Approximately Correct Nash Equilibrium Learning

03/25/2019 ∙ by Filiberto Fele, et al. ∙ University of Oxford 0

We consider a multi-agent noncooperative game with agents' objective functions being affected by uncertainty. Following a data driven paradigm, we represent uncertainty by means of scenarios and seek a robust Nash equilibrium solution. We first show how to overcome differentiability issues, arising due to the introduction of scenarios, and compute a Nash equilibrium solution in a decentralized manner. We then treat the Nash equilibrium computation problem within the realm of probably approximately correct (PAC) learning. Building upon recent developments in scenario-based optimization, we accompany the computed Nash equilibrium with a priori and a posteriori probabilistic robustness certificates, providing confidence that the computed equilibrium remains unaffected (in probabilistic terms) when a new uncertainty realization is encountered. For a wide class of games, we also show that the computation of the so called compression set - which is at the core of the scenario approach theory - can be directly obtained as a byproduct of the proposed solution methodology. We demonstrate the efficacy of the proposed approach in an electric vehicle charging control problem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Game theory has attracted significant attention in the control systems community [1], as it has found numerous applications in smart grid, transportation and IT infrastructures [2, 3, 4, 5, 6]. Nash equilibrium (NE) computation is a central component in this context, allowing to determine no-regret strategies for noncooperative, selfish agents  [7], [8], [9]. As a result, NE has been a popular solution concept for multi-agent distributed and decentralized control architectures, as they naturally lend themselves to price-based implementations [10]. Following the seminal work of Wardrop [11], several results have studied such problems, investigating also the connections with social welfare optima [12, 13, 4, 14, 15].

The aforementioned results in the literature consider a deterministic setting where the systems conditions (e.g., prices in an incentive scheme) are assumed to be known in advance. Such an assumption is not reflected in realistic applications of contemporary interest [16, 17, 18], where uncertainty might be present. Indeed, since complete knowledge of the game was questioned in [19], uncertainty has been widely addressed in noncooperative games, by adopting stochastic or worst-case approaches. In the first case, both chance-constrained (risk-averse) [20, 21] or expected payoff criteria [22, 23, 24, 25, 26]

have been considered. However, the aforementioned methods impose certain assumptions on the underlying probability distribution of the uncertainty. In the second case, results build upon robust control theory 

[9, 27]; however, to ensure tractability certain assumptions on the geometry of the uncertainty set are imposed (see [28, 16]).

In this paper we consider a multi-agent NE seeking problem with uncertainty affecting agents’ objective functions. We depart from existing paradigms and follow a data driven methodology, where we represent uncertainty by a finite set of scenarios that could either be extracted from historical data, or by means of some prediction model (e.g., regression, Markov chains, neural networks)

[29]. Adopting a data driven methodology poses the following main challenges: Our problem becomes a game with the maximum taken with respect to a finite number of scenarios, thus giving rise to a non-differentiable objective function. As a consequence, NE are inherently random as they depend on the extracted scenarios. Therefore, our objective is to investigate the sensitivity of the resulting NE in a probabilistic sense, or in other words, accompany the NE with a certificate on the probability that it will remain unaltered if a new uncertainty realization (other than those included in the scenarios/data) is encountered. A similar attempt has been recently noticed in the literature [30], where the scenario approach is applied to variational inequalities, widely employed to characterize NE solutions. Our approach overcomes the challenges outlined above and extends the results in [30] by performing the following contributions:

1) We treat the NE computation problem in a probably approximately correct (PAC) learning framework [31, 32, 33], and employ the so called scenario approach [34], which up to now has been used solely in an optimization context. Using the recent results in [35] we first provide an a posteriori certificate on the probability that a NE remains unaltered upon a new realization of the uncertainty. We then rely on [36] and provide an a priori probabilistic certificate on the equilibrium sensitivity. It should be noted that the obtained results are distribution-free, and as such the underlying probability distribution of the uncertainty could be unknown and the only requirement is the availability of samples.

Results in [30] provide guarantees on the probability that agents’ objective functions do not deteriorate upon a new uncertainty realization. Here we pose a different learning problem, i.e., probabilistic NE sensitivity analysis (see Section II-C), which allows us to obtain the results in [30] as a byproduct of our analysis. Moreover, the analysis in [30] is based on a non-degeneracy assumption (see Section II for a definition) which, unlike convex optimization programs, is often not satisfied in games; here we relax this assumption and allow for (possibly) degenerate problem instances.

2) We provide an iterative algorithm for decentralized NE computation. To achieve this we circumvent the nondifferentiability of the resulting game that prevents the application of conventional decentralised techniques, by bridging the results in [37] and [5] that involve resorting to an augmented game and incorporating an equilibrium selection mechanism. The proposed scheme enjoys the same convergence properties as state-of-the-art decentralized algorithms for monotone games [5], without however imposing strong monotonicity assumptions on the operator of the underlying variational inequality as in [30]. The latter would effectively imply uniqueness of the associated NE, while in our work we allow for multiple NE (see Section III). Moreover, despite the conceptually similarities with [38], our results allow multiple maximisers in the resulting game.

It should be noted that by means of an epigraph reformulation such a problem can be cast into the class of generalized NE games. However, decentralized solution algorithms for this class of games impose additional assumptions, e.g., affine coupling constraints [6, 4, 12], which do not necessarily fit the format of the resulting epigraphic constraint.

3) Under the additional assumption that the game under consideration admits a unique NE, or for aggregative games with multiple equilibria but a unique aggregate solution, we show that a compression set (an essential concept in the scenario approach theory — see Section  II for a definition) can be directly computed by inspection of the solution returned by the proposed algorithm. This feature has significant computational advantages as it prevents the use of the greedy mechanism presented in [35], which would require running up to numerical convergence multiple times (possibly as many as the number of samples) a NE computation iterative algorithm (see Section V).

The rest of the paper unfolds as follows. In Section II we introduce the scenario-based noncooperative game, pose the main problem, and present the main results of the paper. Section III provides a decentralized construction of the NE of the game under study, while Section IV contains the proof of the main results. In Section V we provide for a wide class of games a computationally efficient methodology to determine an upper bound to the cardinality of the compression set. Section VI provides a numerical example on an electric-vehicle charging problem, while Section  VII concludes the paper and provides some directions for future work.

Ii Scenario based multi-agent game

Ii-a Gaming set-up

Let the set

designate a finite population of agents. The decision vector, henceforth referred to as strategy, of each agent

is denoted by and should satisfy individual constraints encoded by the set . We denote by the collection of all agents’ strategies, where .

Let be an uncertain vector taking values over some set , endowed with a -algebra , where denotes the probability measure defined over . For all subsequent derivations fix any , and let be a finite collection of independently and identically distributed (i.i.d.) scenarios/realizations of the uncertain vector , that we will henceforth designate as an -multisample. In view of our data driven robust considerations, we assume that — for given strategies of the remaining agents — each agent aims at minimizing with respect to the function

(1)

where expresses a deterministic component, different for each agent but still dependent on the strategies of all agents, while encodes the component of the objective function that depends on the uncertain vector. Agents are considered to be selfish, thus interested in minimizing their local objective ; at the same time, they seek to minimize the worst-case (maximum) value can take among a finite set of scenarios. As the latter is common to all agents it entails a certain level of cooperation between them. The electric vehicle charging control problem of Section VI provides a natural interpretation of such a set-up, where electric vehicles are selfish entities each one with a possibly different utility function ; however, they could be participating in the same aggregation plan or belonging to a centrally managed fleet, thus giving rise to a common . Here, the fact that is influenced by uncertainty accounts for price volatility.

We consider a noncooperative game among the agents, described by the tuple , where is the set of agents/players, , are respectively the strategy set and the cost function for each agent , and is a finite collection of samples. We consider the following solution concept for :

Definition 1 (Nash equilibrium).

Let denote the set of Nash equilibria of , defined as

(2)

We impose the following standing assumptions:

Assumption 2.

For every fixed , and every , is convex and continuously differentiable, while the local constraint set is nonempty, compact and convex for all . Moreover, the pseudo-gradient is monotone with constant , while is monotone with constant for any fixed , i.e., for any , and any ,

(3)

and .

Assumption 3.

Let be an open convex set such that . For every , the function is twice differentiable on and, for each , is twice differentiable on .

Remark 4.

It should be noted that we only need that is convex for any fixed , without requiring that both and are simultaneously convex. The monotonicity requirements of Assumption 2 are needed in the proof of Lemma 15, where it is shown that an operator underlying a given variational inequality is monotone, and they are standard in the game theoretic literature [37]. Notice also that it is not required that both are non-negative, but only . These conditions are satisfied in the electric vehicle example of Section VI. A sufficient (but stronger) condition for the monotonicity requirements to be satisfied is for to be jointly convex with respect to all , .

Ii-B Problem statement

As every NE is a random vector due to its dependency on the -multisample, a question that naturally arises is how sensitive a NE is against a new realization of the uncertainty not included in the samples. To provide a rigorous answer to this question we will study the generalization properties of within a probably approximately correct (PAC) learning framework. To this end, let be a NE of the game with samples, and consider a new extraction . Let be a game defined over the scenarios , and denote by the set of the associated NE. Then, for all , let

(4)

denote the probability that a NE of does not remain a NE of , i.e., of the game characterized by the extraction of an additional sample. Note that

is in turn a random variable, as its argument depends on the multisample

. Within the realm of a PAC learning framework, with a given confidence/probability with respect to the product measure (as the samples are extracted in an i.i.d. fashion), we aim at quantifying .

To achieve such a characterization we provide some basic definitions. Let be a single-valued mapping from the set of -multisamples to the set of equilibria of .

Remark 5.

It should be noted that the definition of the game , the set of NE , the mapping (as well as of other associated quantities introduced in the sequel) depends on via the -multisample employed. Therefore, they should be parameterized by , thus giving rise to a family of games, NE sets and mappings. To simplify notation we do not show this dependency explicitly. Moreover, the dimension of the domain of varies according to and it is to be understood that it is in accordance with the number of samples takes as arguments.

Definition 6 (Support sample [36]).

Fix any i.i.d. -multisample , and let be a NE of . Let be the solution obtained by discarding the sample . We call the latter a support sample if .

Definition 7 (Compression set — adapted from [35]).

Fix any i.i.d. -multisample , and let be a NE of . Consider any subset and let . We call a compression set if .

The notion of compression set has appeared in the literature under different names; its properties are studied in full detail in [35], where it is designated as support subsample. Here we adopt the term compression set as in [31, 33] to avoid confusion with Definition 6.

Let be the collection of all compression sets associated with the -multisample . We refer to the compression cardinality as the cardinality of some compression set . Note that — hence also — is itself a random variable as it depends on the -multisample.

Definition 8 (Non-degeneracy — adapted from [39]).

For any , with -probability , the NE coincides with the NE returned by the associated mapping when the latter takes as argument only the support samples. The corresponding game is then said to be non-degenerate; in the opposite case it is called degenerate.

Definition 8 highlights the fact that the notion of support samples and of the compression set are not necessarily the same. It follows then directly that support samples, as identified by Definition 6, form a strict subset of any compression set in ; for a deeper discussion on degeneracy in scenario-based contexts, we refer the reader to [40, 39].

Ii-C Main results

We first show that a single-valued mapping from the set of -multisamples to the set of NE of the game indeed exists, and we can construct it in a decentralized way without imposing (standard) strong requirements on the monotonicity of the game.

Proposition 9.

Under Assumptions 2 and 3, there exists a single-valued decentralized mapping .

This construction, and hence the proof of Proposition 9 is provided in Section III.

Ii-C1 A posteriori certificate

We provide an a posteriori quantification of an upper bound for . This is summarized in the following theorem.

Theorem 10.

Consider Assumptions 2 and 3. Fix and let be a function satisfying

(5)

Let , where is a random i.i.d. sample from . We then have that

(6)

where is the cardinality of any given compression set of .

Theorem 10 shows that with confidence at least the probability that the NE , computed on the basis of the randomly extracted samples , does not remain an equilibrium of the game when an additional sample is considered, is at most . Note that (6) captures the generalization properties of , where accounts for the “probably” and for the “approximately correct” term used within a PAC learning framework.

The structure of is determined in accordance to [35] and depends on the observed compression cardinality , which in turn depends on the random multiextraction thus giving rise to the a posteriori nature of the result. As a result, the level of conservatism of the obtained certificate depends on ; the smaller the cardinality of the computed compression set, the tighter the bound (see Section V for a detailed elaboration on the computation of ). The proof of Theorem 10 is provided in Section IV-A.

In the case of a non-degenerate game (see Definition 8), the bound could be significantly improved by means of the wait-and-judge analysis of [39]: specifically, we can replace the expression for in (5) with the tighter bound of Theorem 1 in [39]. However, it should be noted that non-degeneracy is a condition in general difficult to verify even in convex optimization settings; moreover, here we only assume that for any , is weakly convex, making the non-degeneracy assumption quite restrictive.

Ii-C2 A priori certificate

We now provide an a priori quantification of an upper-bound of . This is summarized in the following theorem.

Theorem 11.

Consider Assumptions 2 and 3. Fix and consider be a function satisfying (5). Let , where is an i.i.d. multisample. We then have that

(7)

The proof of Theorem 11 is provided in Section IV-B. Although similar in form to Theorem 10, the bound on provided by Theorem 11 additionally relies on the developments in [36, 40]: these results are independent of the given multisample and linked instead to the problem structure. In this way, is evaluated on the deterministic quantity , expressing the dimension of the agents’ decision space — plus additional variables, due to the epigraphic reformulation introduced in the proof of Theorem 11 — which is known a priori. If we further assume that for all , for every fixed and , both and are convex, we would only need one epigraphic variable, hence the argument of could be replaced by (see proof of Theorem 11).

As for Theorem 10, if we strengthen the assumptions of Theorem 11 by imposing a non-degeneracy condition (see Definition 8), (5) could be directly replaced by the tighter expression in [39]. We wish to emphasize that it may still be preferable to calculate the cardinality in an a posteriori fashion, as in certain problems the latter might be significantly lower compared to , hence the computed certificate would be less conservative. This is also the case in the electric vehicle charging control problem of Section VI.

The following corollary follows as a direct byproduct of the proofs of our main results (see Section IV).

Corollary 12.

Let and consider . The following hold:

  1. Under the assumptions of Theorem 10, (6) holds with in place of .

  2. Under the assumptions of Theorem 11, (7) holds with in place of .

Corollary 12 shows that, with given confidence, the probability that — and hence also each agent’s objective function — deteriorates with respect to when a new realization of the uncertainty is encountered can be bounded both in an a posteriori and an a priori fashion as in Theorem 10 and Theorem 11, respectively. These statements are established within the proofs of Theorems 10 and 11 (see (22)).

Iii Decentralized NE computation

In this section we show how to construct , necessary ingredient in the proof of Proposition 9. In particular, we show that the image of corresponds to the limit of a decentralized algorithm that returns a NE of the game . To achieve this, we characterise the NE of as solutions to a variational inequality (VI). We then exploit existing results in this context that allow us to obtain sufficient conditions for the existence of equilibria, and set the foundations for the design of a decentralized NE computation mechanism [5].

Iii-a VI analysis

At the core of the use of the VI framework to analyze noncooperative games is the correspondence between the so called VI problem, which for a given domain takes the form

(8a)
(8b)

and the first-order optimality conditions corresponding to a NE (see Definition 1) [41, §1.4.2]. This model naturally hinges on the differentiability of the problem at hand; however, it can be directly observed that, due to the operator in (1), agents’ objective functions defining are in general non-differentiable.

With this in mind, we define the augmented game between agents [37]. In each player , given and , computes

(9)

where follows from the equivalence

(10)

holding for any as is the simplex in  [42, Lemma 6.2.1]. The additional agent (could be thought of as a coordinating authority), given , will act instead as a maximizing player for the uncertain component of , , i.e.,

(11)

Note that, for any -multisample , Assumption 2 entails that involves a differentiable objective functions for all agents. Therefore, we can characterize the NE of as solutions of an appropriate VI. To this end, we define the mapping as the pseudo-gradient [41, §1.4.1]

(12)

Letting and we observe that (8b) represents the concatenation of the first-order optimality conditions for the individual problems described by (9) and (11). In the following, we refer to the problem described by (8) with an operator as in (12) as VI.

It turns out that there is a link between the equilibria of and those of — described by the VI — and that is always nonempty, as established by the following proposition.

Proposition 13.

Under Assumption 2, there always exists a solution of VI. Denote such a solution by . We then have that is a NE of .

Proof.

The existence of a solution for the VI is guaranteed by [41, Cor. 2.2.5] under Assumption 2 and the compactness of . Denote such a solution by . A link between the solutions of the VI and those of the augmented game is established by [41, Prop. 1.4.2]: is a solution of if and only if it solves VI. The link with the original game is provided by [37, Thm. 1]: for any NE of the game , is a NE of , which concludes the proof. ∎

Iii-B Monotonicity of the augmented VI operator

The development of algorithms for the solution of VI problems relies upon the monotonicity of the mapping in (8), which plays a role analogous to convexity in optimization [5].

Definition 14 (Monotonicity).

A mapping , with closed and convex, is

  • monotone on if for all ,

  • strongly monotone on if there exists such that for all .

The following result is instrumental in our analysis:

Lemma 15.

Consider Assumptions 2 and 3. We then have that in (12) is monotone on .

Proof.

By Assumption 3, is continuously differentiable on its domain. Let and denote the first and the last rows of , respectively, i.e., , and . By definition of the Jacobian we have

(13)

where , , and . Notice that is a matrix with being its -th entry. Due to the differentiability conditions of Assumption 3, the monotonicity requirements of Assumption 2 are equivalent (see p. 90 in [37]) to the existence of constants such that, for all , and for all ,

(14)
(15)

with for the sum of and to be convex, as required by Assumption 2. Summing the above inequalities yields

(16)

which, since , corresponds to , and due to the fact that in (13), implies that for all . The statement then follows directly from [41, Prop. 2.3.2], thus concluding the proof. ∎

A direct consequence of the monotonicity of is that by [5, Thm. 41], VI may admit multiple solutions: this fact together with [41, Prop. 1.4.2] — stating the correspondence between the solutions of the VI and the NE of — implies that the game may admit multiple NE.

Iii-C Decentralized algorithm for monotone VI and equilibrium selection

There are two main challenges on constructing a single-valued, decentralized mapping that returns a NE for , as required by Proposition 9: first, due to the possible presence of multiple equilibria, a tie-break rule needs to be put in place to single a particular NE out of the possibly many ones, thus ensuring that is single-valued. Second, standard decentralized algorithms for VI are not guaranteed to converge on monotone problems; a tighter condition, namely, strong monotonicity is required on the VI mapping .

To address the first challenge, it should be noted that even if only one NE is returned by some algorithm (not necessarily decentralized), the mapping is not necessarily single-valued, as using different initial conditions in the underlying algorithm a different NE may be returned. To alleviate this and construct a single-valued mapping, we employ the results of [5]. In particular, [5, Algorithm 4] allows us to select the minimum Euclidean norm NE; the choice of the Euclidean norm is not restrictive, and a wide range of strictly convex objective function could have been used as a selector instead (see [5, Thm. 21]). Therefore, we formulate the following optimization program.

(17a)
(17b)

where is monotone and denotes the Euclidean norm.

To address the second challenge and characterize by means of a decentralized methodology, we can employ the results in [43, 5]. In particular, it is shown that proximal algorithms can be used to retrieve a solution of a monotone VI by solving a particular sequence of strongly monotone problems, derived by regularizing the original problem. Therefore, we consider the regularized game , where and are the designated step size and centre of regularization, respectively. Then, given the tuple , each player solves the following problem

(18)

while the additional agent (player ), given , solves

(19)

with . Note that Assumption 2 still holds for (18)–(19). By taking the pseudo-gradient of the above as in (12), we have from [41, Prop. 1.4.2] that is a NE of if and only if it satisfies the VI corresponding to the regularized game, i.e.,

(20)

The next lemma shows that the regularized game admits a unique NE.

Lemma 16.

Consider Assumptions 2 and 3. Let be as in (12), and fix . We then have that, for any and , the regularized game defined by (18)–(19) admits a unique NE.

Proof.

To establish uniqueness of the NE it suffices to show that is strongly monotone [5, Thm. 41]. Fix any . Let , and define similarly. We have

(21)

where the inequality follows from the fact that , and (see Definition 14) since is monotone due to Lemma 15. By Definition 14, (21) implies that is strongly monotone, thus concluding the proof. ∎

Now let denote the solution of the VI. Building on Lemma 16, we aim at determining a NE of by updating the centre of regularization of on the basis of an iterative method in the form , until convergence to the fixed point . The latter corresponds to the (unique under Lemma 16) NE of , which satisfies (17). Algorithm 1 provides the means to establish such a connection; this is formalised in the following proposition.

1:, , ,
2:
3:repeat (outer loop)
4:     
5:     repeat (inner loop)
6:          for  do
7:                                    
8:          end for
9:                                   
10:          
11:     until 
12:     
13:     
14:until 
Algorithm 1 Decentralized NE seeking algorithm
Proposition 17 (Thm. 21 [5]).

Consider Assumptions 2 and 3. Consider also the game defined by (9)–(11) and the regularized augmented game defined by (18)–(19). Let be any sequence satisfying for all , , and . Consider a small enough and let denote the sequence generated by Algorithm 1. For any , there exists such that is bounded, and solution of (17) such that for . Moreover, .

Convergence of Algorithm 1 is guaranteed by [5, Thm. 21]. In particular, it asymptotically converges to a solution of (17), while by Proposition 13, (17b) it is equivalent to the game , whose solution set is nonempty and is also contained in due to the second part of Proposition 13. Note that [5, Thm. 21] an explicit expression for is provided so that steps 3–11 of Algorithm 1 constitute a block contraction [44].

Proof of Proposition 9: Algorithm 1, and its analysis in Proposition 17, serves as an implicit construction of a decentralized, single-valued mapping , thus establishing Proposition 9.

Note that the mapping induced by Algorithm 1 formally depends on the initial condition; we do not make this dependency explicit since the same minimum norm NE is returned irrespective of the initial condition. We point out however that for the theoretical developments in the proof of Theorem 11 this dependence becomes relevant. Hence, for the analysis of Section IV-B we will introduce the notation to highlight the dependency of on the initial condition . Notice that which also appears in the initial condition of Algorithm 1 depends on the -multisample which is already an argument of , hence we only include as a subscript.

Iv Proofs of a posteriori and a priori certificates

Iv-a Proof of Theorem 10

Fix any . Consider , and let be the cardinality of any given compression set of (recall that it depends on the observation of the -multisample). Let , and . For any , consider the set .

Fix and consider defined as in (5). Under Assumptions 2 and 3, is single-valued by Proposition 9. By [35, Thm. 1] we then have that

(22)

if the following consistency condition holds for (see [33] for a definition)

(23)

To show this, notice that for each , by the NE definition (Definition 1), will belong to the set of minimizers of the following epigraphic reformulation of the optimization program involved in , i.e.,

(24a)
(24b)

By (24b) it follows then that (23) is satisfied, thus establishing (22). Note that for the result of [35] to be invoked, (24) is not required to be a convex optimization program, hence the fact that for each , for any , only is assumed to be convex by Assumptions 2, and not and individually, is sufficient.

By the definition of and , (22) implies that with confidence at least , , thus establishing the first part of Corollary 12.

We now proceed to demonstrate the claim in (6). Recall that, by (12), (17) and Proposition 13, we can obtain as solution of the following optimization program (note the slight abuse of notation as by we denote both the optimizer and the corresponding decision vector).

(25a)
(25b)

where is a NE of . By definition of in (9), and recalling , (25b) can be equivalently written as