Matrix-Valued Mean-Field-Type Games: Risk-Sensitive, Adversarial, and Risk-Neutral Linear-Quadratic Case

04/23/2019 ∙ by Julian Barreiro-Gomez, et al. ∙ NYU college The University of Kansas 0

In this paper we study a class of matrix-valued linear-quadratic mean-field-type games for both the risk-neutral, risk-sensitive and robust cases. Non-cooperation, full cooperation and adversarial between teams are treated. We provide a semi-explicit solution for both problems by means of a direct method. The state dynamics is described by a matrix-valued linear jump-diffusion-regime switching system of conditional mean-field type where the conditioning is with respect to common noise which is a regime switching process. The optimal strategies are in a state-and-conditional mean-field feedback form. Semi-explicit solutions of equilibrium costs and strategies are also provided for the full cooperative, adversarial teams, risk-sensitive full cooperative and risk-sensitive adversarial team cases. It is shown that full cooperation increases the well-posedness domain under risk-sensitive decision-makers by means of population risk-sharing. Finally, relationship between risk-sensitivity and robustness are established in the mean-field type context.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Markowitz paradigm, also termed as the mean-variance paradigm, is often characterized as dealing with portfolio risk and (expected) return

[1, 2, 3]

. A typical example of risk concerns in the current online market is the evolution of prices of the digital and cryptocurrencies (bitcoin, litecoin, ethereum, dash, and other altcoins etc). Variance plays a base model for many risk measures. Here, we address variance reduction problems when several decision-making entities are involved. When the decisions made by the entities influence each other, the decision-making is said to be interactive (interdependent). Such problems are termed as game problems. Game problems in which the state dynamics is given a linear stochastic system with a Brownian motion and a cost functional that is quadratic in the state and the control, is often called the linear-quadratic Gaussian (LQG) games. For generic LQG game problems under perfect state observation, the optimal strategy of the decision-maker is a linear state-feedback strategy which is identical to an optimal control for the corresponding deterministic linear-quadratic game problem where the Brownian motion is replaced by the zero process. Moreover the equilibrium cost only differs from the deterministic game problem’s equilibrium cost by the integral of a function of time. However, when the diffusion (volatility) coefficient is state and control-dependent, the structure of the resulting differential system as well as the equilibrium cost vector are modified. These results were widely known in both dynamic optimization, control and game theory literature. For both LQG control and LQG zero-sum games, it can be shown that a simple square completion method, provides an explicit solution to the problem. It was successfully developed and applied by Duncan et al.

[4, 5, 6, 7, 8, 9] in the mean-field-free case. Moreover, Duncan et al. have extended the direct method to more general noises including fractional Brownian noises and some non-quadratic cost functionals on spheres and torus. Inspired by applications in engineering (internet connexion, battery state etc) and in finance (price, stock option, multi-currency exchange etc) were not only Gaussians but also jump processes (Poisson, Lévy etc) play important features, the question of extending the framework to linear-quadratic games under state dynamics driven by jump-diffusion processes were naturally posed. Adding a Poisson jump and regime switching may allow to capture in particular larger jumps which may not be captured by just increasing diffusion coefficients. Several examples such as multi-currency exchange or cloud-server rate allocation on blockchains are naturally in matrix form.

The main goal of this work is to investigate whether Direct Method can be used to solve matrix-valued risk-sensitive and adversarial robust mean-field-type game problems which are non-standard problems [10]. To do so, we modify the state dynamics to include conditional mean-field terms which are

  • the conditional expectation of the matrix-valued state with respect to the filtration of the common noise which is a regime switching process, is added to the drift, diffusion, and jump coefficient functionals.

  • the conditional expectation of the matrix-valued control-actions, is included in the drift, diffusion, jump coefficient functional.

We also modify the instant cost and terminal cost function to include

  • the square of the conditional expectation of the matrix-valued state and

  • the square of the conditional expectation of the matrix-valued control action.

Involving these features lead to matrix-valued mean-field-type game theory which focuses on (matrix-valued) games with distribution-dependent quantity-of-interest such as payoff, cost and state dynamics. It can be seen as the multiple agent generalization of single agent (matrix-valued) mean-field-type control problem [10].

If the state dynamics and or the cost functional involve a mean-field term (such as the expectation of the state and or the expectation of the control actions), the game is said to be a LQG game of mean-field type, or MFT-LQG. For such game problems, various solution methods such as the Stochastic Maximum Principle (SMP) ([10]) and the Dynamic Programming Principle (DPP) with Hamilton-Jacobi-Bellman-Isaacs equation and Fokker-Planck-Kolmogorov equation have been proposed [11, 10, 12].

If the state dynamics and or the cost functional involve a conditional mean-field term (such as the conditional expectation of the matrix-valued state and or the conditional expectation of the matrix-valued control actions), the game is said to be a matrix-valued LQG game of conditional mean-field type, or cMFT-LQG (or conditional McKean-Vlasov matrix-valued LQG games). If in addition, the matrix-valued state dynamics is driven by a matrix-valued jump-diffusion process, then the problem is termed as a cMFT-LQJD matrix-valued game problem. We aim to study the behavior of such cMFT-LQJD matrix-valued game problems when conditional mean-field terms are involved.

Games with global uncertainty and common noise have been widely suggested in the literature. Anonymous sequential and mean-field games with common noise can be considered as a natural generalization of the mean-field game problems (see [13, 14] and the references therein). The works in [15, 16] considered mean-field games with common noise and obtained optimality system that determine mean-field equilibria conditioned of the information. The work in [18, 19] provides sufficiency conditions for well-posedness of mean-field games with common noise and a major player. Existence of solutions of the resulting stochastic optimality systems are examined in [17]. A probabilistic approach to the master equation is developed in [43]. In order to determine the optimal strategies of the decision-maker, the previous works used a maximum principle or a master equation which involves a stochastic Fokker-Planck equation (see [19, 20, 21, 22] and the references therein).

Most studies illustrated mean-field game methods in the linear-quadratic game with infinite number of decision-makers [23, 24, 25, 26, 27]. These works assume indistinguishability within classes and the cost functionals were assumed to be identical or invariant per permutation of decision-makers indexes. Note that the indistinguishability assumption is not fulfilled for many interesting problems such as variance reduction or and risk quantification problems in which decision-makers have different sensitivity towards the risk. One typical and practical example is to consider a multi-level building in which every resident has its own comfort zone temperature and aims to use the Heating, Ventilating, and Air Conditioning (HVAC) system to be in its comfort temperature zone and maintain it within its own comfort zone. This problem clearly does not satisfy the indistinguishability assumption used in the previous works on mean-field games. Therefore, it is reasonable to look at the problem beyond the indistinguishability assumption. Here we drop these assumptions and dealt with the problem directly with arbitrarily finite number of decision-makers. In the LQ-mean-field game problems the state process can be modeled by a set of linear stochastic differential equations of McKean-Vlasov and the preferences are formalized by quadratic or exponential of integral of quadratic cost functions with mean-field terms. These game problems are of practical interests and a detailed exposition of this theory can be found in [10, 28, 29, 30, 31, 32]

. The popularity of these game problems is due to practical considerations in consensus problems, signal processing, pattern recognition, filtering, prediction, economics and management science

[20, 33, 34, 35].

To some extent, most of the risk-neutral versions of these optimal controls are analytically and numerically solvable [7, 29, 9, 4, 5]. On the other hand, the linear quadratic robust setting naturally appears if the decision makers’ objective is to minimize the effect of a small perturbation and related variance of the optimally controlled nonlinear process. By solving a linear quadratic game problem of mean-field type, and using the implied optimal control actions, decision-makers can significantly reduce the variance (and the cost) incurred by this perturbation. The variance reduction and minmax problems have very interesting applications in risk quantification problems under adversarial attacks and in security issues in interdependent infrastructures and networks [37, 36, 38, 35, 39]. In [41], the control for the evacuation of a multi-level building is designed by means of mean-field games and mean-field-type control. In [42], electricity price dynamics in the smart grid is analyzed using a mean-field-type game approach under common noise which is of diffusion type. Risk-neutral Linear-Quadratic MFT-LQJD games have been studied for the one dimensional case in [47].

Our contribution

In this paper, we use a simple argument that gives the risk-neutral equilibrium strategy and robust adversarial mean-field-type saddle point for a class of cMFT-LQJD matrix-valued games without use of the well-known solution methods (SMP and DPP). We apply a basic Itô’s formula following by a square completion method in the risk-neutral/adversarial mean-field-type matrix-valued game problems. It is shown that this method is well suited to cMFT-LQJD risk-neutral/robust games as well as to variance reduction performance functionals with jump-diffusion-regime switching common noise. Applying the solution methodology related to the DPP or the SMP requires involved (stochastic) analysis and convexity arguments to generate necessary and sufficient optimality criteria. We avoid all this with this method.

Zero-sum stochastic differential games are important class of stochastic games. The optimality system leads to an Hamilton-Jacobi-Bellman-Isaacs (HJBI) system of equations, which is an extension of the HJB equation to stochastic differential games. When common noise is involved it becomes a stochastic HJBI system. Obviously, studying well-posedness, existence and uniqueness of such is a challenging task because of the minmax and maxmin operators. Usually, upper value and lower value equilibrium payoffs are investigated. In addition, when conditional mean-field terms are involved as it is the case here, the system is coupled with a stochastic Fokker-Planck-Kolmogorov system leading to a master system. Here we provide an easy way to solve such a system of means of a direct method.

Relationship between risk-sensitive and roust conditional mean-field-type games are established in the case without jump and with a single regime.

Our contribution can be summarized as follows. We formulate and solve a matrix-valued linear-quadratic mean-field-type game described by a linear jump-diffusion dynamics and a mean-field-dependent quadratic cost functional that is conditioned a common noise which includes not only a Brownian motion but also a jump process and regime switching. Since the matrices are switching dependent, they can be seen as random coefficients. The optimal strategies for the decision-makers are given semi-explicitly using a simple and direct method based on square completion, suggested in Duncan et al. in e.g. [6] for the mean-field free case. This approach does not use the well-known solution methods such as the Stochastic Maximum Principle and the Dynamic Programming Principle with stochastic Hamilton-Jacobi-Bellman-Isaacs equation and stochastic Fokker-Planck-Kolmogorov equation. It does require extended stochastic backward-forward partial integro-differential equations (PIDE) to solve the problem. In the risk-neutral linear-quadratic mean-field-type game with perfect state observation and with common noise, we show that, generically there is a minmax strategy to the conditional mean of the state and provide a sufficient condition of existence of mean-field-type saddle point. Sufficient conditions for existence and uniqueness of robust mean-field equilibria are obtained when the horizon length is small enough and the Riccati coefficients are almost surely positive.

In addition, this work extends the results in [40] in various ways:

  • Extension to matrix-form of arbitrary dimensions.

  • the common noise which is a regime switching was not considered in [40].

  • The solution here involves a matrix-valued differential system which differs from the results in [40].

To solve the aforementioned problem in a semi-explicit way, we follow a direct method. The method starts by identifying a partial guess functional where the coefficient functionals are random and regime switching dependent. Then, it uses Itô’s formula for jump-diffusion-regime switching processes, followed by a completion of squares for both control and conditional mean control. Finally, the processes are identified using an orthogonal decomposition technique and stochastic differential equations are derived in a semi-explicit way. The procedure is summarized in Figure 1. The contributions of this work are summarized in Table 1.


Figure 1: Direct method and its key steps.
Reference

Cost function Terms
[40] this work

risk-neural non-cooperation
, 1D , matrix-valued
risk-neural full-cooperation , 1D , matrix-valued
risk-neural adversarial/robust , 1D , matrix-valued
risk-sensitive non-cooperation , matrix-valued
risk-sensitive full-cooperation , matrix-valued
risk-sensitive adversarial/robust , matrix-valued


Drift Terms



Diffusion Terms


Jump Terms

Switching Term

Cost function Terms
Number of Decision makers
One single team
Multiple players
Two adversaries
Two adversarial teams
Table 1: Contributions with respect to recent literature.

To the best of the authors knowledge this is the first work to consider regime switching in matrix-valued mean-field-type game theory.

A brief outline of the rest of the paper follows. The next section introduces a generic game model. After that, the MF-LQJD conditional mean-field-type game problem is investigated and its solution is presented. The last section concludes the paper.

Notation and Preliminaries

We introduce the following notations. Let be a fixed time horizon and

be a given filtered probability space.The filtration

is the natural filtration of the union augmented by null sets of In practice, is used to capture smaller disturbance and is used for larger jumps of the system.

An admissible control strategy of the decision-maker is an -adapted and square-integrable process with values in a . We denote the set of all admissible controls by .

2 Problem Formulation

We consider decision-makers over the time horizon Each decision-maker chooses a matrix-valued strategy over the horizon The state satisfies the following matrix-valued linear jump-diffusion-regime switching system of mean-field type:

(1)

where , , , , , , is a matrix-valued Brownian motion, , is a regime switching process with transition rates satisfying , . is a matrix-valued Poisson random process with compensated process is a matrix of Radon measure over the set of jump sizes is the filtration generated by the regime switching process By abuse of notation we omit the use of and for the left value of switching and the jump respectively.

To the state system (1), we associate the cost functional of decision-maker

(2)

where being the adjoint operator of (transposition), The coefficients are possibly time and regime-switching dependent with values in

The reader may want to know why a matrix-valued dynamics instead of a vector-valued dynamics. One typical example is the evolution of the rate of change of blockchain tokens and classical currencies. The exchange rate between token and token is given by the entries of with value Since these tokens are correlated one obtains a matrix-valued process

2.1 Risk-Neutral

We provide basic definitions of the risk-neutral problems and their solution concepts.

Definition 1 (Mean-Field-Type Risk-Neutral Best-Response)

Given a risk-neutral best response strategy of decision-maker is a strategy that solves subject to (1). The set of risk-neutral best responses of is denoted by

Definition 2 (Mean-Field-Type Risk-Neutral Nash Equilibrium)

A mean-field-type risk-neutral Nash equilibrium is a strategy profile of all decision-makers such that for every decision-maker

Definition 3 (Mean-Field-Type Risk-Neutral Full Cooperation)

A mean-field-type risk-neutral fully cooperative solution is a strategy profile of all decision-makers such that

where

is the social (global) cost.

Definition 4 (Mean-Field-Type Risk-Neutral Saddle Point Solution )

The set of decision-makers is divided into two teams. A team of defenders and a team of attackers. The defenders set is

and the attackers set is

A mean-field-type risk-neutral saddle point is a strategy profile of the team of defenders and of the team of attackers such that

and is the value of the adversarial team (risk-neutral) game, where

2.2 Risk-Sensitive

We provide basic definitions of risk-sensitive problems and their solution concepts.

Definition 5 (Mean-Field-Type Risk-Sensitive Best-Response)

Given a risk-sensitive best response strategy of decision-maker is a strategy that solves

subject to (1). The set of risk-sensitive best responses of is denoted by

For

the risk-sensitive loss functional

includes not only the first moment

but also all the higher moments

Definition 6 (Mean-Field-Type Risk-Sensitive Nash Equilibrium )

A mean-field-type risk-sensitive Nash equilibrium is a strategy profile of all decision-makers such that for every decision-maker

Definition 7 (Mean-Field-Type Risk-Sensitive Full Cooperation)

A mean-field-type risk-sensitive fully cooperative solution is a strategy profile of all decision-makers such that

Definition 8 (Mean-Field-Type Risk-Sensitive Saddle Point Solution )

The set of decision-makers is divided into two teams. A team of defenders and a team of attackers. The defenders set is and the attackers set is A mean-field-type risk-sensitive saddle point is a strategy profile of the team of defenders and of the team of attackers such that

3 Main Results

This section presents the main results of the article.

3.1 Risk-Neutral Case

We start with the risk-neutral Nash equilibrium problem.

Theorem 1

Assume that are symmetric positive definite. Then the matrix-valued mean-field-type (risk-neutral) Nash equilibrium strategy and the (risk-neutral) equilibrium cost are given by:

where and solve the following differential equations:

(3)

whenever these differential equations have a unique solution that does not blow up within .

Under the symmetric matrix assumption above, it is easy to check that if is a solution then is also a solution. Therefore

From the state system (1), the conditional expected matrix where is the natural filtration of the regime switching process up to solves the following system:

which means that

which will be used for feedback in the optimal strategy. Next, we provide a semi-explicit solution to the full-cooperation case.

Corollary 1

Assume that are symmetric positive definite. The fully cooperative solution of the problem is given by

where , and solve the following differential equations:

(4)

Notice that these Riccati equations have positive solution , and and there is no blow up in .

The proof of Corollary 1 is immediate from Theorem 1 by one single team and with a choice vector of matrices

Corollary 2

Assume that are symmetric positive definite for and are symmetric positive definite for We assume that The adversarial game problem of the team attackers and the team of defenders has a saddle and it is given by

where , and solve the following differential equations:

(5)

The proof of Corollary 2 is immediate from Theorem 1 by considering two adversarial teams and with choice vector of matrices and respectively.

Notice that the Riccati equations in (9) have positive definite solution if in addition

which does not blow up within , and positive solution if in addition

within Next, we study the risk-sensitive case and point out some facts regarding the comparison of its solution with respect to the risk-neutral case as the risk-sensitivity index vanishes.

3.2 Risk-Sensitive Case

A risk averse decision-maker (with cost functional) is a decision-maker who prefers higher cost with known risks rather than lower cost with unknown risks. In other words, among various control strategies giving the same cost with different level of risks, this decision-maker always prefers the alternative with the lowest risk.

When the exponential martingale of compensated Poisson random process times a linear process yields to an exponential non-quadratic terms