DeepAI
Log In Sign Up

Personalized incentives as feedback design in generalized Nash equilibrium problems

03/24/2022
by   Filippo Fabiani, et al.
University of Oxford
0

We investigate both stationary and time-varying, nonmonotone generalized Nash equilibrium problems that exhibit symmetric interactions among the agents, which are known to be potential. As may happen in practical cases, however, we envision a scenario in which the formal expression of the underlying potential function is not available, and we design a semi-decentralized Nash equilibrium seeking algorithm. In the proposed two-layer scheme, a coordinator iteratively integrates the (possibly noisy and sporadic) agents' feedback to learn the pseudo-gradients of the agents, and then design personalized incentives for them. On their side, the agents receive those personalized incentives, compute a solution to an extended game, and then return feedback measurements to the coordinator. In the stationary setting, our algorithm returns a Nash equilibrium in case the coordinator is endowed with standard learning policies, while it returns a Nash equilibrium up to a constant, yet adjustable, error in the time-varying case. As a motivating application, we consider the ridehailing service provided by several companies with mobility as a service orchestration, necessary to both handle competition among firms and avoid traffic congestion, which is also adopted to run numerical experiments verifying our results.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/06/2021

Learning equilibria with personalized incentives in a class of nonmonotone games

We consider quadratic, nonmonotone generalized Nash equilibrium problems...
11/16/2017

On the Verification and Computation of Strong Nash Equilibrium

Computing equilibria of games is a central task in computer science. A l...
03/29/2022

A stochastic generalized Nash equilibrium model for platforms competition in the ride-hail market

The presence of uncertainties in the ride-hailing market complicates the...
01/27/2022

Decentralized Fictitious Play Converges Near a Nash Equilibrium in Near-Potential Games

We investigate convergence of decentralized fictitious play (DFP) in nea...
09/10/2020

Nash equilibrium seeking under partial-decision information over directed communication networks

We consider the Nash equilibrium problem in a partial-decision informati...
04/29/2022

Contests to Incentivize a Target Group

We study how to incentivize agents in a target group to produce a higher...
02/10/2022

Game Theoretic Analysis of an Adversarial Status Updating System

We investigate the game theoretic equilibrium points of a status updatin...

I Introduction

Noncooperative game theory represents a contemporary and pervasive paradigm for the modelling and optimization of modern multi-agent systems, where agents are typically modelled as rational decision-makers that interact and selfishly compete for shared resources in a stationary environment. Here, the (generalized) Nash equilibrium solution concept [facchinei2007generalized] denotes a desired outcome of the game, which is typically self-learned by the agents through iterative procedures alternating distributed computation and communication steps [salehisadaghiani2016distributed, salehisadaghiani2019distributed, ye2017distributed, gadjov2021exact].

Real-world scenarios, however, are rarely stationary. This fact, along with recent developments in machine learning and online optimization [jadbabaie2015online, Bogunovic2016, shahrampour2017distributed, davis2019stochastic, dixit2019online, simonetto2019personalized, dall2020optimization], has fostered the implementation of online multi-agent learning procedures, thus contributing in growing the interest for games where the population of agents ambitiously aim at tracking possibly time-varying Nash equilibria online. A recent research direction, indeed, established the convergence of online distributed mirror descent-type algorithms in strictly monotone GNEP [tampubolon2020coordinated]

, aggregative games with estimated information

[tampubolon2019convergence], or in price-based congestion control methods for generic noncooperative games [tampubolon2020robust]. Conversely, [mertikopoulos2019learning] focused on the prediction of the long-term outcome of a monotone NEP, also extended to the case of delays in the communication protocol [zhou2018multi]. The convergence of no-regret learning policies with exponential weights in potential games within a (semi-)bandit framework was explored in [cohen2017learning], while [cardoso2019competing] introduced an algorithm with sublinear Nash equilibrium regret under bandit feedback for time-varying matrix games, and [duvocelle2018learning] showed that, in case of a slowly-varying monotone NEP, the dynamic regret minimization allows the agents to track the sequence of equilibria.

Unlike the aforementioned literature, in this paper we focus on both static and time-varying, nonmonotone generalized Nash equilibrium problems that exhibit symmetric interactions among the agents. Prominent examples can be found in smart grids and demand-side management [kattuman2004allocating, zhu2011multi, cenedese2019charging] in case one considers a fair electricity market; in congestion games [rosenthal1973class, altman2007evolutionary], in which shared resources incur costs that depend on the number of users occupying them; or in coordination control problems [zhang2010cooperative, fabiani2018distributed], where barrier functions typically enforce distance-based constraints. Such a symmetric structure brings numerous advantages to the GNEP that enjoys it. Among them, the underlying GNEP is known to be potential with associated potential function [ui2000shapley, la2016potential], which implicitly entails the existence of a GNE. Nonetheless, such a potential function frequently enables for the design of equilibrium seeking algorithms with convergence guarantees (especially in nonconvex setting [heikkinen2006potential, fabiani2019multi, cenedese2019charging]).

However, unless one has a deep knowledge on the main quantities characterizing the symmetric interactions of the GNEP at hand or, conversely, the agents’ cost functions are designed to result in a potential game [li2013designing], finding the formal expression of the potential function is known to be a hard task [la2016potential, Ch. 2][hobbs2007nash]. Moreover, in some cases it may simply be either unavailable or unknown. In many in-network operations that require some degree of coordination, indeed, it is highly desirable that the parameters of the agents’ cost function, which reflect local sensitive data, stay private. Specifically, our work is motivated by the following application, thoroughly discussed in §VI as case study for numerical simulations.

I-a Motivating example: Ridehailing with mobility as a service orchestration

With the growing business related to ridehailing, a maas coordination platform appears indispensable to contrast the traffic congestion due to the increasing number of vehicles dispatched on the road, while at the same time facilitating the competition among service providers [pandey2019needs, diao2021impacts]. Specifically, let us consider a scenario with ridehailing type firms, such as Lyft, Uber, Didi Chuxing, Via, or Juno, which compete to put the most vehicles (a capped local resource, ) on the road to attract the most customers. During the day, each company aims at maximizing its profit , which is implicitly related to how many cars it could currently put on the road to meet customer needs, properly scaled and discounted to account for, e.g., refusals rates or the time of the day, . To this end, bigger companies can naïvely be induced to dispatch as many cars as they own. However, this may cause traffic congestion, thus reducing the quality of the service provided, and therefore lessen what the company can charge for each ride. In fact, by leveraging their own experience, those big companies may estimate how many cars actually get customers on top of the available as, e.g., a concave function , with tuned accordingly. Therefore, assuming the same fare applies per average trip to each costumer, the profit function of the -th lead company can read as On the other hand, the strategies of smaller companies are typically less affected by traffic congestions, since the quality service is generally worse in the sense that they can dispatch a little number of cars on the road. In this case, their direct experience may suggest that the number of cars that actually get customers on the available can be modelled as a convex function , thus reflecting the fact that the larger the number of deployed cars, the larger the possibility to cover enough space to be attractive. The overall profit hence reads as In addition to the profit, however, the companies also incur in costs that have to be minimized and vary during the day, such as gas consumed or miles travelled. By assuming, for instance, the same cost associated to each vehicle per average trip, the overall cost can be modelled as a linear function of the number of cars such as Finally, for competition purposes, companies and with the same size would like to roughly offer the same service, so that needs to be minimized as well. At the same time, it may happen that a certain company aims at offering a better service than company , and hence coupling constraints in the form arise, while the ridehailing firms as a whole can put on the road a maximum of cars, so that , for some minimum service lower bound , with .

In the proposed scenario where the firms exhibit symmetries in the mixed convex-concave cost functions, the maas platform aims at coordinating the whole ridehailing service while avoiding traffic congestion. This can be achieved, for instance, by imposing extra fees, incentives or restrictions to the companies, possibly according to their size and turnover. However, note that the parameters and affecting the cost function of each firm, which are hence key to drive its strategy, can not be disclosed to the maas platform, since they represent sensitive information, as opposed to the incurred cost that can be estimated directly, as it depends on mileage, fuel consumption and number of deployed cars, all data that are somehow publicly available (see, e.g., [nyc_ridehailing]). Therefore, a possible strategy for the maas platform establishes to learn those time-varying parameters by leveraging feedback collected from users, e.g., on the price they are charged , and then design tailored incentives for the coordination.

I-B Main contributions

We design a semi-decentralized scheme that allows the agents to compute (or track in a neighbourhood) a GNE of a nonmonotone GNEP that admits an unknown potential function, both in static and time-varying setting (§II). Specifically, in the outer loop of the proposed two-layer algorithm we endow a coordinator with an online learning procedure, aiming at iteratively integrating the (possibly noisy and sporadic) agents’ feedback to learn some of their private information, i.e., the pseudo-gradient mappings associated to the agents’ cost functions. The goal is to drive the population to an GNE. The reconstructed information is thereby exploited by the coordinator to design parametric personalized incentives for the agents [simonetto2019personalized, ospina2020personalized, notarnicola2020distributed]. On its side, the population of agents receives those personalized incentives, computes a solution, i.e., a v-GNE, to an extended game by means of available algorithms in the inner loop, and then returns feedback measures to the coordinator.

Unlike the proposed problem setting, we stress that [fabiani2021nash] considered a specific class of stationary nonmonotone GNEP only, i.e., the quadratic one, where a feedback to the coordinator was provided at every iteration. Furthermore, we highlight that GNE seeking in nonmonotone GNEP is a hard task even in a stationary setting, whose underlying literature is not extensive. Examples of solution algorithms, tailored for static NEP or generic VI (hence possibly not amenable to distributed computation), can be found in, e.g., [konnov2006regularization, yin2009nash, yin2011nash, konnov2014penalty, lucidi2020solving]. In addition, our semi-decentralized scheme may also fit in a Stackelberg game framework, in which the leader does not control any decision variable, albeit aims at minimizing the unknown potential function on the basis of optimistic conjectures on the followers’ strategies [kulkarni2015existence, fabiani2020local]. Here instead, we leverage the symmetry of interactions characterizing the agents taking part into the nonmonotone GNEP to design parametric personalized incentives, which play a crucial role in the convergence of the algorithm, as they bring a twofold benefit: i) enabling the agents for the computation of a v-GNE in the inner loop by acting as a convexification terms for their cost functions; and ii) boosting the convergence and/or lessening the tracking error through a fine tuning of few parameters (§III). As main results, in the static case we show that the proposed algorithm converges to a GNE by exploiting the asymptotic consistency bounds characterizing typical learning procedures for the coordinator, such as ls or gp (§IV). Conversely, in the time-varying setting we show that the fixed point residual, our metric for assessing convergence, asymptotically behaves as , i.e., the proposed semi-decentralized scheme allows the agents to track a GNE in a neighbourhood of adjustable size (§V). We corroborate our findings on a numerical instance of the ridehailing service with maas orchestration in §VI. The proofs of theoretical results are all deferred to Appendix -A-D.

Notation: , and denote the set of natural, real and nonnegative real numbers, respectively. is the space of

symmetric matrices. For vectors

and , we denote and . With a slight abuse of notation, we also use . is the class of continuously differentiable functions. The mapping is monotone on if for all ; strongly monotone if there exists a constant such that for all ; hypomonotone if there exists a constant such that for all . If is differentiable, denotes its Jacobian matrix. Throughout the paper, variables with as subscript do not explicitly depend on time, as opposed when is an argument.

Ii Problem formulation

In this section, we first formally introduce the multi-agent equilibrium problem addressed, and then we discuss some key points that essentially motivate our solution algorithm.

Ii-a Noncooperative GNEP with symmetric interactions

We consider a noncooperative game , with agents, indexed by the set . Each agent controls its locally constrained variable and, at every discrete time instant , aims at solving the following time-varying optimization problem:

(1)

for some function , , which denotes the private individual cost, whose value at time can be interpreted as the (dis)satisfaction of the -th agent associated to the collective strategy . The collection of optimization problems in (1) amounts to a GNEP, where every is a map stacking coupling, yet locally separable, constraints among the agents. Let us first define the sets and , with , and then let us introduce some standard assumptions.

Standing Assumption 1.

For each , and for all ,

  1. The mapping is of class and has a -Lipschitz continuous gradient;

  2. is a nonempty, compact and convex set, is a convex and of class function.

The feasible set of the time-varying GNEP thus coincides with [facchinei2007generalized, §3.2]. In the proposed time-varying context, we are then interested in designing an equilibrium seeking algorithm for the game , according to the following popular definition of GNE.

Definition 1.

(Generalized Nash equilibrium [facchinei2007generalized]) For all , is a GNE of the game if, for all ,

(2)

A collective vector of strategies is therefore an equilibrium at time if no player can decrease their objective function by changing unilaterally to any other feasible point. For the remainder, we make the following assumption on the pseudo-gradient mapping , which is formally defined as .

Standing Assumption 2.

For every and , .

Roughly speaking, Standing Assumption 2 establishes that each pair of agents influences each other in an equivalent way. For the mapping , this entails the existence of a differentiable, yet possibly unknown, function such that , for all and [facchinei2007finite, Th. 1.3.1], which coincides with a potential function [facchinei2011decomposition] for and can be characterized as stated next.

Lemma 1.

For all , is -Lipschitz continuous, while is -weakly convex, with .

Note that is a smooth function, which in principle may be nonconvex. Let be the set of its (local and global) constrained minimizers, assumed to be nonempty, and be the set of its constrained stationary points, with , for all . We stress that the nonemptiness of guarantees the existence of a GNE for , since any satisfies the relation in (2).

Ii-B Main challenges and technical considerations

Fig. 1: Personalized incentives as feedback design to steer the population to a point guaranteeing the “minimum” (dis)satisfaction, according to the unknown function , i.e., a GNE of the game .

In the considered formulation, we identify three main critical issues, both technical and practical, that rule out the possibility to compute a GNE for the time-varying GNEP in (1) through standard arguments, thus fully supporting the need for a tailored learning procedure as the one introduced later in the paper.

First, the time-varying nature of the optimization problems in (1) calls for an answer to the thorny question on whether there exist online learning policies that allow agents to track a Nash equilibrium over time (or to converge to one if the stage games stabilize). Even in the case of a potential game with known potential function, this is a challenging problem [cohen2017learning].

In addition, despite the symmetry of interactions among agents, we note that for all the mapping may not be monotone, a key technical requirement for the most common solution algorithms for GNEP available in the literature, which compute a GNE by relying on the (at least) monotonicity of the pseudo-gradient mapping [salehisadaghiani2016distributed, salehisadaghiani2019distributed, ye2017distributed, gadjov2021exact].

Finally, we stress that Standing Assumption 2, albeit quite mild and practically satisfied in several real-world scenarios [rosenthal1973class, kattuman2004allocating, zhu2011multi, zhang2010cooperative], is key to claim that the underlying GNEP is potential, a fact that typically helps in designing Nash equilibrium seeking algorithms with convergence guarantees (especially in nonconvex/nonmonotone setting, e.g., [heikkinen2006potential, fabiani2019multi, cenedese2019charging, lei2020asynchronous])). However, unless one has a deep knowledge of the GNEP at hand, finding the formal expression of the potential function is known to be a hard task [la2016potential, Ch. 2]. Thus, we assume to do not have an expression for that can be exploited directly for the equilibrium seeking algorithm design.

To address these crucial issues, we design personalized feedback functionals in the spirit of [simonetto2019personalized, notarnicola2020distributed, ospina2020personalized, fabiani2021nash], which are used as “control actions” in the two-layer semi-decentralized scheme depicted in Fig. 1. Specifically, our goal is to steer the noncooperative agents to track minimizers of the unknown, time-varying function , i.e., a GNE of the game , according to Definition 1. Any can indeed be interpreted as a collective strategy that minimizes the (dis)satisfaction of the of agents, measured by the function .

Iii Learning algorithm with personalized incentives

Iteration :
  • Learn pseudo-gradients

  • Design personalized incentives

  • Compute a GNE of the extended game ,

  • Retrieve noisy agents’ feedback

  • Algorithm 1 Two-layer semi-decentralized scheme

    We here describe the main steps of the proposed semi-decentralized learning procedure, also discussing how the design of personalized incentive functionals promises to be key in addressing the challenges introduced in the previous section.

    Iii-a The two-layer algorithm

    The proposed approach is summarized in Algorithm 1. Specifically, in the outer loop a central coordinator aims at learning online (i.e., while the algorithm is running) the unknown, time-varying function (or its gradient mapping, ) by leveraging possibly noisy and sporadic agents’ feedback on the private functions ’s (S0). On the basis of the estimated , at item (S1) the coordinator designs personalized incentive functionals , and at item (S2) induces the noncooperative agents to face with an extended version of the GNEP in (1), i.e., , with in place of . Under a suitable choice of the personalized incentives, we will show that they act as regularization terms, as well as they trade-off convergence and robustness to the inexact knowledge of function and its gradient. Specifically, such incentives enable for the practical computation of an equilibrium of the extended game at item (S2) through available solution algorithms for GNEP [salehisadaghiani2016distributed, ye2017distributed, gadjov2021exact].

    Note that standard procedures in literature typically returns a v-GNE [cavazzuti2002nash, facchinei2007generalized], which coincides to any solution to the GNEP that is also a solution to the associated VI, i.e., any vector such that, for all ,

    (3)

    where the mapping is formally defined as , and . For these reasons, in referring to the computational step (S2), we tacitly assume that the agents compute a v-GNE of the extended game .

    Finally, at item (S3) the agents return feedback measures and their equilibrium strategies, , with

    , for some random variable

    , to the central coordinator, thus indicating to what extent the current equilibrium (dis)satisfies the entire population of agents.

    Iii-B Personalized incentives design

    In view of Standing Assumption 2 and the consequences it brings, e.g., the fact that , a natural approach to design the personalized incentives seems to iteratively learn and point a descent direction for the unknown function , thus implicitly requiring one to estimate the pseudo-gradients , at every . Along the line of [ospina2020personalized, fabiani2021nash], we assume the central coordinator being endowed with a learning procedure such that, at every outer iteration (Algorithm 1, item (S0)), it integrates the most recent agents’ feedback to return an estimate of the pseudo-gradients, . A possible personalized incentive functional can hence be designed as

    (4)

    where , for some parameters , , for all . Unlike what one might expect, each requires a positive sign for the gradient step . However, note that this fact is not uncommon – see, e.g., the recent Heavy Anchor method [gadjov2021exact, Eq. (7)]. Moreover, in the next sections we will also discuss how such a choice enables us to boost the convergence of Algorithm 1 (as shown, for example, in [fabiani2021nash, §V] on a numerical instance of a quadratic hypomonotone GNEP) or lessen the tracking error, through a fine tuning of the step-size .

    Thus, once the parametric form in (4) is fixed, we design suitable bounds for and in such a way that the sequence of GNE, , monotonically decreases and converges to some point in . As stressed in the previous section, the gain is crucial to enable for the computation of a v-GNE at item (S2) in Algorithm 1, as stated next.

    Proposition 1.

    Let for all . Then, with the personalized incentives in (4), the mapping is -strongly monotone, for all .

    Thus, at every , in (S2) the population of agents computes the (unique, see [facchinei2007finite, Th. 2.3.3]) v-GNE associated to the extended version of the GNEP in (1), .

    In the remainder of the paper, we will consider both the stationary and time-varying case of online perfect and imperfect reconstruction of the pseudo-gradient mappings, also analyzing the results with different learning strategies . To this end, a key quantity will be the fixed point residual , whose norm “measures” the distance to the points in when the function is fixed in time, as stated next.

    Lemma 2.

    Let be the sequence of v-GNE generated by Algorithm 1 with , assume perfect reconstruction of the mapping , and that for some . Then, is a stationary point for the function , i.e., .

    It is fundamental to appropriately choose and to drive the sequence of v-GNE along a descent direction for the unknown , and ensuring . In case of imperfect reconstruction of , or in the time-varying setting, we also adopt the average value of over a certain horizon of length , i.e., , , as a metric for the convergence of the sequence generated by Algorithm 1 to the stationary point set.

    We remark here that, on the one hand, finding the stationary points is the general goal in nonconvex setting [scutari2017parallel], and on the other hand, since Algorithm 1 generates monononically decreasing values for , the application of simple perturbation techniques (e.g., [escape]) can ensure that the stationary points to which we converge are in practice constrained local minima for , namely points belonging to , and therefore GNE of the GNEP in (1), according to Definition 1.

    Remark 1.

    The bounds on the parameters and provided in the paper assume the knowledge of the constant of weak convexity of , . However, as long as the coordinator is endowed with a learning policy, one may include this additional condition in the learning process, thus obtaining bounds that depend on , the estimate of .

    Iv The stationary case

    We start by discussing the case in which each in (1) is fixed in time, thus implying that . First, we analyze the case of perfect reconstruction of the pseudo-gradient mappings IV-A), and then we investigate their inexact estimate (§IV-B). Here, our result will be of the form in case the reconstruction error is non-vanishing. Otherwise, IV-C), thus recovering the results shown in §IV-A.

    Iv-a Online perfect reconstruction of the pseudo-gradients

    In case the learning procedure enables for , , by adopting the personalized incentives in (4) at every outer iteration , we have the following result.

    Lemma 3.

    Let and , for all . Then, with the personalized incentives in (4), the vector is a descent direction for , i.e., .

    Then, if (resp., ) is large (small) enough, at every iteration of Algorithm 1, the personalized functionals in (4) allow to point a descent direction for the unknown (dis)satisfaction function . Next, we establish the convergence of the sequence of v-GNE generated by Algorithm 1.

    Proposition 2.

    Let and , for all . With the personalized incentives in (4), the sequence of v-GNE , generated by Algorithm 1, converges to some point in .

    By introducing , from the first step of the proof of Proposition 2 we have that , which points out that a fine tuning of the term allows us to boost the convergence of Algorithm 1 to some point in (also observed on a numerical example in [fabiani2021nash, §V]). This essentially explains the choice for a positive sign in the gradient step of (4). However, due to the presence of noise in the agents’ feedback , it seems unlikely that the online algorithm is able to return a perfect reconstruction of , at least at the beginning of the procedure in Algorithm 1.

    Iv-B Inexact estimate of the pseudo-gradients

    At every outer iteration , we assume the coordinator has available agents’ feedback , , and , to estimate the gradients (and hence the mapping ). The value of reflects situations in which the coordinator gathered information before starting the procedure (), or it obtains sporadic feedback from the agents (). Without restriction, we make the following, standard assumption on the reconstructed mapping directly, rather than on each single gradient [dixit2019online, ospina2020personalized, dall2020optimization].

    Assumption 1.

    For all and , , and, for any , there exists and available agents’ feedback such that , for some nonincreasing function such that , for all .

    With Assumption 1, the reconstruction error on made by

    is bounded with high probability

    by some function of the available agents’ feedback. Then, after defining the quantities and , we have the following result.

    Lemma 4.

    Let Assumption 1 hold true for some fixed , and for all . Then, with the personalized incentives in (4), for all we have

    (5)

    with probability , for some .

    In case of inexact estimate of the pseudo-gradients, the vector is not guaranteed to be a descent direction for the unknown function . In fact, the term rules out the possibility that the LHS in (5) is strictly negative, albeit it can be made arbitrarily small through by an appropriate choice of the step-size . As in §IV-A, the following bound characterizes the sequence of v-GNE generated by Algorithm 1.

    Theorem 1.

    Let Assumption 1 hold true for some fixed , and , for all . Moreover, let some be fixed, and, for any global minimizer , . Then, with the personalized incentives in (4), the sequence of v-GNE , generated by Algorithm 1, satisfies the following relation with probability

    (6)

    Here, , , and is the number of available agents’ feedback at the -th outer iteration, .

    Roughly speaking, Theorem 1 establishes that, with arbitrarily high probability, the average value of the residual over a certain horizon is bounded by the sum of two terms, which depend on the initial distance from a minimum for the unknown function , and the reconstruction error . Note that the terms in the RHS can be made small by either choosing a small step-size , in order to make close to zero, or tuning the product close to one, thus leading to a large . This latter choice, however, would increase the term involving the sub-optimal constant , thus requiring an accurate trade-off in tuning the gain and the step-size . In the stationary case, to foster not exceedingly aggressive personalized actions the coordinator may then want to match the lower bound for , while striking a balance in choosing to possibly boost the convergence of Algorithm 1.

    For simplicity, let us now assume that is a constant term. From Assumption 1, , and hence We note that, as grows, vanishes, and the average of stays in a ball whose radius depends on the number of agents’ feedback made available to perform (S0) in Algorithm 1 and, specifically, on the learning strategy . Next, we analyze the bound above under the lens of different learning procedures.

    Iv-C Specifying the learning strategy

    By requiring that the reconstruction error is bounded in probability, Assumption 1 is quite general and it holds true under standard assumptions for ls and gp approaches to learning . In particular, we have the following:

    • In parametric learning, if is modelled as an affine function of the learning parameters ’s, then setting up an ls approach to minimize the loss between the model parameters and the agents’ feedback leads to a convex quadratic program. Due to the large-scale properties of ls (under standard assumptions), the error term

      behaves as a normal distribution, for which Assumption 

      1 holds true (see [notarnicola2020distributed, Lemma A.4]), and .

    • In non-parametric learning, suppose is a sample path of a gp with zero mean and a certain kernel. Due to the large-scale property of such regressor and under standard assumptions, also in this case Assumption 1 holds true (see [simonetto2019personalized]) and .

    Note that, in general, . Therefore, since , for the cases above we obtain thus recovering the results obtained for the perfect reconstruction case shown in §IV-A.

    V The time-varying case

    We now investigate the GNEP in (1) in case the local cost function of each agent varies in time, thus implying that also the function is non-stationary. Our goal is still to design the parameters defining the personalized incentives to track a time-varying GNE that minimizes the (dis)satisfaction function, i.e., some , both in case of perfect (§V-A) and inexact reconstruction (§V-B) of the pseudo-gradient mapping.

    To start, we make the following typical assumptions in the literature on online optimization [jadbabaie2015online, shahrampour2017distributed, dall2020optimization].

    Assumption 2.

    For all and , it holds that

    1. , for ;

    2. For all , , for ;

    3. Moreover, , for .

    Assumptions 2 i) and ii) essentially bound the variation in time of both the unknown function and the pseudo-gradient mappings, while Assumptions 2 iii) guarantees the boundedness of the distance between two consecutive minima such that and . Note that, with these standard assumptions in place, an asymptotic error term of the form of is inevitable [jadbabaie2015online, mokhtari2016online, NaLi2020, dall2020optimization].

    Lemma 5.

    Let Assumption 2 ii) hold true. For all , , with .

    V-a Online perfect reconstruction of the pseudo-gradients

    In case the learning procedure allows for , for all and , we have the following ancillary results.

    Lemma 6.

    Let Assumption 2 ii) hold true, and , for all . Then, with the personalized incentives in , for all we have

    (7)

    As in the stationary case in §IV-B, the vector is not guaranteed to be a descent direction for the unknown mapping in the sense of Lemma 3. In fact, the error , introduced because of the time-varying nature of the pseudo-gradients, excludes that the LHS in (5) is strictly negative. The following bound characterizes the sequence of v-GNE originating from Algorithm 1 in case allows for a perfect reconstruction of the time-varying mapping .

    Theorem 2.

    Let Assumption 2 hold true, and , for all . Moreover, let some be fixed, and, for any global minimizer , . Then, with the personalized incentives in (4), the sequence of v-GNE , generated by Algorithm 1, satisfies the following relation

    (8)

    with , and

    Theorem 2 says that the average of the residual over the horizon is bounded by the sum of two terms, which depend on the initial sub-optimality of a computed v-GNE compared to a minimum for the unknown function , and several bounds on the variations in time of , and constrained minima postulated in Assumption 2 and Lemma 5. In this case, the coordinator may reduce the error in the RHS by properly tuning the product close to one, thus leading to a large , and hence possibly boosting the convergence of Algorithm 1. In fact, if the parameter is fixed in time, we obtain This inequality ensures that will be always contained into a ball of constant radius, whose value can be adjusted through and .

    V-B Inexact estimate of the pseudo-gradients

    As in §IV-B, we consider the case in which, due to possibly noisy agents’ feedback, the learning procedure does not allow a perfect reconstruction of each time-varying gradient , . First, we postulate the time-varying counterpart of Assumption 1, and then we provide a preliminary result.

    Assumption 3.

    For all and , , and, for any , there exists and available agents’ feedback such that , for some nonincreasing function such that , for all .

    Lemma 7.

    Let Assumption 2 and 3 hold true for some fixed , and , for all . With the personalized incentives in , for all we have

    (9)

    where , with probability , for some .

    Along the same line drawn for the stationary case with inexact reconstruction, we now provide the following bound on the sequence of v-GNE, , generated by Algorithm 1. Note the slight abuse of notation in defining , which is different from the one in Theorem 1.

    Theorem 3.

    Let Assumption 2 and 3 hold true for some fixed , and , for all . Moreover, let some be fixed, and, for any global minimizer , . Then, with the personalized incentives in (4), the sequence of v-GNE , generated by Algorithm 1, satisfies the following relation

    (10)

    with probability , where , and is the number of agents’ feedback at the -th outer iteration.

    Also in this case, the average of the residual over the horizon is bounded by the sum of two terms, which depend, among the others, on the reconstruction error of the mapping and its variations in time. We note that the bound in the RHS of (4) can be adjusted through an accurate choice of the gain the step-size . Specifically, choosing a small reduces the reconstruction error, hidden in the variable , while setting close to one induces a large value for (and for as well), thus possibly eliminating the second term under the square root of (10), and the one outside.

    For simplicity, let us now suppose that the parameters and of the personalized incentives in (4) are fixed in time, namely is a constant term. From (10), we note that Due to the time-varying nature of the problem in question, also in this case the average residual can not vanish as grows, albeit the radius of the error ball can be reduced through a fine tuning of and .

    V-C Specifying the time-varying learning strategy

    In a time-varying setting, one cannot expect to vanish in general, since the time variations in are not supposed to be asymptotically vanishing [jadbabaie2015online, dall2020optimization]. Popular learning approaches include ls with forgetting factors [Mateos09giannakis] and time-varying gp [Bogunovic2016], for which we have .

    Vi Ridehailing with maas orchestration

    We verify our findings by resuming the motivating example in §I-A, and then running simulations on a numerical instance.

    Vi-a Problem description

    The parameters adopted are retrieved from open data collected in New York City in April 2019 [nyc_ridehailing] that provide information on companies: Yellow taxi, Uber, Lyft, Juno and Via. We stress that the of the ridehailing service, measured as the total number of vehicles deployed on the road, coincides with an integer variable, i.e., , for all , thus leading to a mixed-integer setting. However, since the fleet dimension of each firm we are considering is in the order of few thousands of vehicles (i.e., Juno and Via), or tens of thousands for bigger companies (Uber, Lyft, Yellow taxi), we consider the relaxed version associated to the example in §I-A by treating as a scalar continuous variable, and then rounding its value [pandey2019needs]. For this reason, we roughly estimate a round-off error in the order of for any GNE computed at item (S2) in Algorithm 1 through an extragradient type method [solodov1996modified] (thus partially neglecting the multi-agent nature of the inner loop). Moreover, we decide to split the hours of a day in intervals, enumerated in the set , according to the estimated average travel time of each costumer with no shared trips, i.e., about minutes. Then, at every , each firm aims at solving the following mutually coupled optimization problem

    (11)

    where denotes the number of cars deployed by company , lower and upper bounded by ,