Markov decision process (MDP) congestion games have been successfully used to model distributions of selfish decision makers when competing for finite resources [5, 6]. In particular, MDP congestion games allows for stochastic dynamics in congestion games by mapping user inputs to probabilistic outcomes. An equilibrium concept similar to Wardrop equilibrium of routing games , an MDP Wardrop equilibrium, describes steady state population behaviour at which no players can optimize their expected state-action costs through further changes in their decision strategies.
In modelling a physical process as an MDP congestion game, the game equilibrium is an approximation to the true steady-state of the physical process; this is because models inherently cannot predict the physical process to full accuracy. The underlying assumption is that the modelling errors cause negligible deviations of prediction from physical equilibrium. However, this is false if the steady state distribution is sensitive to changes in the modelling parameters. This motivates our study of sensitivity of MDP congestion game to state-action costs.
In this paper, we quantify sensitivity for the occurrence of stochastic Braess paradox, and relate the paradox to its deterministic counterpart. We also define and derive conditions for MDP dynamics and state-action costs under which our sensitivity analysis is valid. Finally we bound the sensitivity of a stochastic MDP congestion game in terms of the sensitivity of its deterministic counterpart.
Here we’d also like to emphasize why we consider the sensitivity of Wardrop equilibrium to the state-action cost parameters. In utilizing MDP congestion game models to forecast steady state behaviour of a physical system, state-action costs are often parameterized by experimental data, which typically has uncertainty. When the cost parameter uncertainty is bounded, it is natural to consider bounding the deviation of true equilibrium from the predicted equilibrium. Secondly, the sensitivity of game equilibrium is highly relevant to Stackelberg games for the leader, who may utilize the sensitivity information to derive an optimal action sequence for its own objective . Finally, we can consider a game designer who has a certain ‘budget’ for changing the cost function, and wishes to alter existing game equilibrium to maximize an external objective. In such settings, it’s valuable to know the optimal direction of change that will achieve the most impact with respect to designer’s alternative objective.
We review existing literature on sensitivity, hypergraphs and MDP congestion games in section II. In section III, we introduce MDP congestion games and its hypergraph interpretation, game equilibria definition, and KKT characterization of the optimal distribution. Sensitivity results and stochastic Braess paradox characterizations are given in section IV. We analyze stochasticity’s effect on paradox sensitivity in section V. Finally, simulations demonstrating stochastic Braess paradox and the sensitivity analysis is shown in section VI.
Ii Related work
MDP congestion games [5, 6] combine features of non-atomic routing games [23, 3, 18], i.e. where decision makers influence each other’s edge costs through congestion effects over a network—and stochastic games [20, 12]—i.e. where each decision maker solves an MDP. MDP Wardrop equilibrium for MDP congestion game, akin to Wardrop equilibrium of routing games, was introduced in .
Our analysis resembles sensitivity work on Wardrop equilibria in traffic assignment literature [22, 19, 17], where the analysis is closely related to Braess paradox . The occurrence of Braess paradox is known to be linked to the underlying graph of routing games . Sensitivity of other games to modelling parameters have also been studied . We note that while similar techniques are used, our work is fundamentally different due to our MDP network structure and focus on stochastic effects on the game equilibrium.
In this section, we introduce MDP congestion game framework from an individual decision maker’s perspective and define a variational inequality-style game equilibria. From a system-level perspective, we formulate MDP congestion game as a potential game supported by hypergraph structure. We denote the set by
and the vectorby .
Iii-a MDP Congestion game
In an archetypal finite MDP problem, each decision maker solves a finite-horizon MDP  with horizon length , state space , and action space given by
where the objective is to minimize the expected average cost over an infinite time horizon with a finite set of actions and a finite set of states . The optimization variable defines a state-action distribution of an individual decision maker, such that
denotes a decision maker’s probability of taking actionat state .
The probability kernel, defines the MDP transition dynamics,
where denotes the transition probability from state to when taking action . The kernel is element-wise non-negative and column stochastic.
In a non-atomic MDP congestion game, an infinite number of decision makers each solves an MDP on the same state-action space. The total population distribution is described by .
Assumption 1 (Mean Field Assumption).
In the limit where the number of decision makers approaches to infinity, the total population becomes a continuous distribution with total mass , where denotes the portion of population who chooses action at state .
The population distribution is related to individual distributions by
where is the index set of feasible distributions with respect to MDP (1), and corresponds to the portion of population that chooses distribution .
Assumption 1 results in a non-atomic
nature of MDP congestion games: each decision maker’s probability distributionis infinitesimal with respect to the population distribution , such that changes in an individual does not affect .
In an MDP congestion game, the state-action costs, , are defined as population distribution dependent functions, i.e.,
We denote as the vector of state-action costs. The population dependency of reflects congestion effects: the greater the population in a given state-action pair, the greater the cost of taking that state-action for all decision makers. This assumption is consistent with practical networked interactions in traffic and telecommunications  where, e.g., the cost of traversing a road increases for each driver when the number of cars on the road increases.
The state-action costs are continuously differentiable and is positive definite.
In an MDP congestion game, all decision makers achieve their optimal expected cost when the population distribution is at MDP Wardrop equilibrium.
Definition 1 (MDP Wardrop Equilibrium ).
We say a population distribution which satisfies Assumption 1 is a Wardrop equilibrium when each decision maker’s probability distribution satisfies
Definition 1 implies that each decision maker’s strategy is optimal in the following sense: if any individual decision maker deviates from its current strategy, it will end up with a worse expected cost as a result.
Iii-B Directed Hypergraphs
Similar to stochastic shortest path problems [9, 14], MDP congestion game is inherently related to hypergraphs . We consider a weighted directed hypergraph , where is the set of states considered in MDP congestion game and is the set of hyperarcs. A hyperarc is defined for each state-action pair, such that the tail is always at , and the head, , is the set of states that can be reached from state taking action —i.e.,
Each hyperarc is equivalent to a state-action pair. We represent each hyperarc by .
The elements of a hypergraph incidence matrix is defined as
The hypergraph incidence matrix itself can be written as . In this form, we can see that the difference in probability density per state (i.e., ) before and after a stochastic transition (i.e., ) can be written as . Therefore a stationary distribution of an infinite horizon MDP satisfies . The hypergraph incidence matrix is rank deficient — always satisfies .
A directed hypergraph is strongly connected if every non-empty subset has at least one incoming hyperarc from the set . In the following consider hypergraphs whose incidence matrix has rank .
Assumption 3 (Incidence Rank).
The hypergraph that corresponds to probability transition kernel is strongly connected, and its incidence matrix has row rank .
An MDP congestion game can be formulated as a potential game in terms of population distribution . The potential game formulation of MDP congestion game is given by
where constraints on in (3) can be derived from feasibility conditions of individual decision makers.
A detailed KKT analysis of an MDP congestion game is given in . Here we analyze the case when . Given non-negative constraints on , the first component of the KKT conditions is equivalent to , where equality is achieved at . When the cost functions satisfy Assumption 2, uniqueness of the tuple is guaranteed . We note that due to the rank deficiency of , must be non-unique. However, when the hypergraph satisfies Assumption 3, we can show that the feasibility constraint is equivalent to where is the incidence matrix with one row removed and has full rank. This would ensure a unique projection of on the reduced subspace that is the range of .
Lemma 1 (Full Row Rank Incidence Matrix).
Consider removing arbitrary row vector from the incidence matrix . By Assumption 3, is not identically . Clearly, implies . To see that the opposite implication also holds, note that from definition leads to . Therefore implies .∎
Iv Sensitivity Analysis
In this section, we derive a sensitivity characterization of stochastic Braess paradox. We implicitly characterize the optimal population of an MDP congestion game by perturbing its latency functions and performing a sensitivity analysis with respect to these perturbations.
To facilitate the analysis, we introduce perturbation dependent cost functions , where the additional input represents perturbation to the cost function. The game itself is played with respect to a given perturbation and a corresponding cost . The KKT conditions (4) can also be viewed as an implicit characterization of optimal population as parameterized by . Based on the KKT conditions we define as a point-to-set mapping given by
The point-to-set mapping, , generalizes local differentiability of as a function of . When satisfies Assumption 2 in the first argument, the associated optimization formulation (5) has a unique solution and is a single valued set mapping; in this case we denote the unique optimal distribution by . Unless otherwise stated, Assumption 2 holds from now on.
Consider an MDP congestion game played with costs and its optimal solution . When is a single valued set mapping for an open set of containing zero, the Jacobian exists and defines the sensitivity of MDP Wardrop equilibria—i.e., it describes how changes when cost is perturbed by .
We restrict our attention to MDP congestion games whose unique equilibrium satisfies ; this assumption is equivalent to the fact that every state-action is utilized by some players with non-negligible mass.
Assumption 4 (Positivity Condition).
The optimal primal solution to the unperturbed MDP congestion game, , is strictly positive.
Assumption 4 is not restrictive in the following sense: when state-action costs satisfy Assumption 2, Assumption 4 will always be satisfied for some total mass . Consider cost functions that satisfy at optimal distribution. If a hyperarc is not optimal, i.e. has no mass, then must be at least . However, all other state action costs must increase as total mass increases, therefore a total mass threshold exists for which , past which will become optimal.
Proposition 1 (Perturbation Map).
From Assumptions 2 and 4, there exists a unique solving the KKT conditions (4) for costs . Lagrange multiplier from complementary slackness. The other optimal solutions can be determined by solving , and . Furthermore, unique and implies is unique. Since has full rank, is unique. Finally the KKT conditions can be simplified to given above. ∎
Proposition 1 implies that when is continuously differentiable at and , there exists a continuously differentiable and invertible function of the optimal distribution in terms of . We note that similar sensitivity results which do not consider stochastic congestion effects exist for routing games . However, our results for MDP congestion games are less restrictive due to the lack of the dual route/link space.
Theorem 1 (MDP Congestion Game Flow Sensitivity).
Consider an MDP congestion game with costs , such that is a continuously differentiable function of and satisfies Assumption 2, and the associated hypergraph satisfies Assumption 3. If the equilibrium distribution , the sensitivity of the MDP Wardrop equilibrium is given by
Moreover, the sensitivity of optimal state-action costs is
where , as given by Lemma 1, , and .
From Proposition 1, the game with costs has associated single valued mapping in a neighborhood of ; let . Then implies the total derivative for . Furthermore is continuously differentiable in all of its inputs. From the implicit function theorem [8, Sec.1B], when is invertible,
We wish to show the non-singularity of
The Schur complement of with respect to the lower block diagonal component is . From assumptions 3 and 2, has full row rank and . Therefore is positive definite and non-singular and equivalently, and non-singular.
The partial gradient of with respect to is
We use Gaussian elimination to invert and get
where , , , are defined as follows:
, and . We decompose and solve for ,
where the first row corresponds to and the second row corresponds to . The first block corresponds to . Note that because , we can express the optimal cost as
The sensitivity of the costs with respect to perturbation is
Iv-a Stochastic Braess Paradox
The sensitivity of the optimal edge costs and distribution is important from a game design perspective. In the routing game literature, a well-known phenomenon that is related to the sensitivity of optimal distribution is Braess paradox . The phenomenon refers to the paradoxical effect that occurs when costs of traversing edges are decreased, resulting in an increase in player’s average cost. Occurence of Braess paradox has been shown to be related to a routing game’s underlying network structure .
Here, we show that not only does a similar behavior occur in MDP congestion games, but that a stochastic Braess paradox can be linked to the underlying hypergraph structure through sensitivity analysis. Consider the social cost of an MDP congestion game,
Stochastic Braess paradox can be defined by the sensitivity of the social cost of MDP congestion games.
Definition 2 (Stochastic Braess Paradox).
When is continuously differentiable, the existence of Braess paradox suggests that there is a perturbation which increases the state-action costs from to such that . We derive sufficient conditions for stochastic Braess paradox using the sensitivity of and .
Corollary 1 (Sufficient Conditions for stochastic BP).
is bilinear and therefore continuously differentiable in and . From Theorem 1, there exists a neighbourhood within which is continuously differentiable in , and the Jacobian is given as
For any , there exists such that and . We then consider the MDP congestion game with costs and equilibrium , where is defined by
By the mean value theorem, there exists where
Since , holds. ∎
V Role of Stochasticity
In this section, we consider the deterministic counterpart of MDP congestion games to evaluate how the introduction of stochasticity influences social cost sensitivity.
V-a Cycle Game
A directed primal graph  can be derived from a hypergraph , by considering the same set of states and define edge set defined by
Its incidence matrix is given by
An MDP congestion game (3) can be played on for a given cost . The constraint implies that any feasible distribution must be a combination of cycles of . Therefore, we call a deterministic MDP congestion game where all state-action pairs lead to deterministic outcomes, a cycle game .
The edge set of a primal graph dictates allowable transitions over state space . A hypergraph’s hyperarc set corresponds to a discrete set of particular probability distributions assignments to state-actions as given by . We consider a transformation between the incidence matrix of a hypergraph , and that of its host graph, , such that . Columns of denote how an action distributes mass over edges adjacent to of the primal graph,
In addition to being element-wise non-negative, is also column stochastic—i.e.,
and the transformation is given by
The eigenvalues ofcharacterize the amount of stochasticity introduced by the MDP dynamics. When
, the MDP congestion game is itself a cycle game with no stochasticity. When each state-action pair uniformly distributes the probability over available edges,has a block diagonal structure with eigenvalues less than 1 if a state has two or more actions available. Fig. 2 also provides an example of a feasible transformation that is invertible.
V-B Effects of Stochasticity
When the incidence matrix of a hypergraph is related to the incidence matrix of the corresponding primal graph by an invertible transformation , there is a direct relationship between the equilibria of the MDP congestion game and cycle game played on these graphs.
Assumption 5 (Invertible Transformation ).
A directed hypergraph can be induced from its directed primal graph , such that , and the incidence matrices, and , of the two graphs, respectively, are related by an invertible transformation .
Proposition 2 (Equilibria Relationship).
If the graph of an MDP congestion game satisfies Assumption 5, is an MDP Wardrop equilibrium if and only if is an equilibrium of the cycle game defined on with costs on its edges where
Consider an MDP Wardrop equilibrium that satisfies Assumption 4, then there exists primal variable solution and dual variables , that satisfy the KKT conditions (4) with . We can re-write from (4) with transformations and , and ,
Since is element-wise non-negative, and , . By construction, is column stochastic, therefore . Therefore (9) is equivalent to the KKT conditions of a game with cost , deterministic incidence matrix , and optimal distribution .
We note that is positive definite, and while an individual state-action cost requires multiple hyperarcs’ population distribution to define the congestion cost at , it defines a potential game  consistent with Assumption 2. This implies that (9) coincides with the KKT conditions of a cycle game formulation with costs , incidence matrix , and mass . Since satisfies the KKT conditions of this cycle game, is the cycle game’s unique optimal distribution. ∎
The relationship between the equilibria of the deterministic game and the equilibria of the game allows for a direct comparison between the sensitivity of the social cost in the two games. We show next that the social cost sensitivity of a MDP congestion game can be directly bounded by the eigenvalues of , ie the amount of stochasticity introduced.
Theorem 2 (Effects of Stochasticity).
Let , where is with any one row removed. From Assumption 3, the removed row cannot be identically zero as that would ensure , then is related to by where has the same row removed.
Since , the sensitivity of the cycle game social cost can be evaluated at ,
where and . In comparison, the sensitivity of the MDP congestion game’s social cost is
We can compare the social cost sensitivity Jacobian for the cycle game and the MDP congestion game, denoted by and respectively.
Theorem 2 states that given equivalent Wardrop equilibria, the sensitivity of the social cost in the deterministic cycle game is always bounded by the sensitivity of the MDP congestion game and the amount of stochasticity introduced. Since , Theorem 2 states that introducing stochasticity increases effects of Braess paradox.
In this section, we use the results of sensitivity analysis on a hypergraph derived from a directed Wheatstone graph. Wheatstone structure is known to induce Braess paradox for non-atomic routing games , we analyze its behaviour under stochastic transitions and show that not only does stochastic Braess paradox also occur, but we can avoid the paradox by our sensitivity analysis. We demonstrate Theorem 1 by cost perturbations in both the negative and positive directions of the social cost sensitivity, and validating the predictions with simulated results.
All state action pairs correspond to hyperarcs, but all state-action pairs except for hyperarc define deterministic actions. The stochastic incidence matrix is defined by
Note that when a hyperarc has one head state, its corresponding column of incidence matrix is identical to that of the cycle game incidence matrix (Section V-A). Stochastic hyperarcs are convex combinations of the deterministic edges that correspond to allowable state transitions originating from the same tail state.
We simulate each MDP congestion game by solving the convex optimization formulation given by (3) with cvxpy. First, we verify in Figure 4 that at given costs , the optimal distribution is strictly positive.
We consider perturbing the hyperarc costs modelled by
Sensitivity of social cost can be analytically derived from theorem 1 based on the hypergraph structure as
The sensitivity vector implies that increasing the third hyperarc cost would result in the most decrease in social cost, while increasing the second hyperarc cost would result in the most increasing in social cost. We verify both scenarios by successively increasing and re-evaluating the social cost at the optimal distribution , as solved by cvxpy. The results are shown in Figures 5 and 6.
A couple conclusions can be drawn from Figures 5 and 6. First, we see that there exists a continuous region around where , and therefore renders this sensitivity analysis valid. Figure 5 shows a negative sensitivity value for the third hyperarc as we increase , which implies stochastic Braess paradox. Then as predicted, the social cost decreases as is increased. In contrast, Figure 6 shows a positive sensitivity value for the second hyperarc as we increase , therefore the social cost should not decrease as increases. This is also confirmed as the social cost obtained from the output of cvxpy increases with . Both Braess paradox and the absence of Braess paradox is correctly predicted for the regions where positive mass exists on every hyperarc.
We derived sensitivity analysis for MDP congestion games when the optimal mass distribution is strictly positive. From the sensitivity of optimal cost and distribution to changes in state-action cost, we derived sufficient conditions for the occurrence of stochastic Braess paradox defined in terms of network and cost structure. Finally, we considered effects of stochasticity on the magnitude of Braess paradox. Our simulations explicitly show the occurrence of stochastic Braess paradox on MDP congestion games. Future work include generalizing the analysis to MDP congestion games whose optimal mass distribution is not strictly positive.
-  (2007) Hypertree width and related hypergraph invariants. European Journal of Combinatorics 28 (8), pp. 2167–2181. Cited by: §V-A.
-  (1999) Constrained markov decision processes. Vol. 7, CRC Press. Cited by: §III-A.
-  (1952) A continuous model of transportation. Econometrica, pp. 643–660. Cited by: §II.
-  (1968) U about a paradox of traffic planning. Oper. Res. 12 (1), pp. 258–268. Cited by: §II, §IV-A.
-  (2017) Markov decision process routing games. In Proc. Int. Conf. Cyber-Physical Syst., pp. 273–279. Cited by: §I, §II, §V-B.
-  (2017) Infinite-horizon average-cost markov decision process routing games. In Proc. Intell. Transp. Syst., pp. 1–6. Cited by: §I, §II, §III-B, §III-B, §V-A, Definition 1.
-  (2017) Models of competition for intelligent transportation infrastructure: parking, ridesharing, and external factors in routing decisions. Ph.D. Thesis, U.C. Berkeley. Cited by: §III-B.
-  (2009) Implicit functions and solution mappings. Springer Monographs in Mathematics. Springer 208. Cited by: §IV, §IV.
-  (2009) Efficient graph topologies in network routing games. Games and Economic Behavior 66 (1), pp. 115–125. Cited by: §III-B.
-  (1993) Directed hypergraphs and applications. Discrete applied mathematics 42 (2-3), pp. 177–201. Cited by: §III-B.
-  (2001) Cuts and flows. In Algebraic Graph Theory, pp. 307–339. Cited by: §V-A.
Int. J. Game Theory10 (2), pp. 53–66. Cited by: §II.
-  (2006) Network topology and the efficiency of equilibrium. Games and Economic Behavior 57 (2), pp. 321–346. Cited by: §II, §IV-A, §VI.
-  (2006) Finding the k best policies in a finite-horizon markov decision process. Eur. J. Oper. Res. 175 (2), pp. 1164–1179. Cited by: §III-B.
-  (1964) Traffic assignment manual. US Department of Commerce. Cited by: §III-A.
-  (2019) A variational inequality framework for network games: existence, uniqueness, convergence and sensitivity analysis. Games and Economic Behavior. Cited by: §II.
-  (2004) Sensitivity analysis of traffic equilibria. Transportation Science 38 (3), pp. 258–281. Cited by: §II.
-  (2015) The traffic assignment problem: models and methods. Courier Dover Publications. Cited by: §II.
-  (1992) Sensitivity analysis for variational inequalities. Math. Op. Res. 17 (1), pp. 61–76. Cited by: §II.
-  (1953) Stochastic games. Proc. Nat. Acad. Sci. 39 (10), pp. 1095–1100. Cited by: §II.
-  (2009) Optimal product design under price competition. Journal of Mechanical Design 131 (7), pp. 071003. Cited by: §I.
-  (1988) Sensitivity analysis for equilibrium network flow. Transportation Science 22 (4), pp. 242–250. External Links: Cited by: §II, §IV.
-  (1952) Some theoretical aspects of road traffic research. In Inst. Civil Engineers Proc. London/UK/, Cited by: §II.