Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: I -- The Ergodic Case

07/13/2019 ∙ by René Carmona, et al. ∙ 0

We propose two algorithms for the solution of the optimal control of ergodic McKean-Vlasov dynamics. Both algorithms are based on the approximation of the theoretical solutions by neural networks, the latter being characterized by their architecture and a set of parameters. This allows the use of modern machine learning tools, and efficient implementations of stochastic gradient descent. The first algorithm is based on the idiosyncrasies of the ergodic optimal control problem. We provide a mathematical proof of the convergence of the algorithm, and we analyze rigorously the approximation by controlling the different sources of error. The second method is an adaptation of the deep Galerkin method to the system of partial differential equations issued from the optimality condition. We demonstrate the efficiency of these algorithms on several numerical examples, some of them being chosen to show that our algorithms succeed where existing ones failed. We also argue that both methods can easily be applied to problems in dimensions larger than what can be found in the existing literature. Finally, we illustrate the fact that, although the first algorithm is specifically designed for mean field control problems, the second one is more general and can also be applied to the partial differential equation systems arising in the theory of mean field games.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The purpose of this paper is to develop numerical schemes for the solution of Mean Field Games (MFGs) and Mean Field Control (MFC) problems. The mathematical theory of these problems has attracted a lot of attention in the last decade (see e.g. [23, 15, 9, 17, 18]), and from the numerical standpoint several methods have been proposed, see e.g. [2, 1, 16, 11, 21] and [24, 4, 29, 7] for finite time horizon MFG and MFC respectively, and [6, 14, 12]

for stationary MFG. However, despite recent progress, the numerical analysis of these problems is still lagging behind because of their complexity, in particular when the dimension is high. Here, we choose a periodic model to demonstrate that powerful tools developed for machine learning applications can be harnessed to produce efficient numerical schemes performing better than existing technology in the solution of these problems. We derive systematically the mathematical formulation of the optimization problem amenable to the numerical analysis, and we prove rigorously the convergence of a numerical scheme based on feed-forward neural network architectures. Our first method is designed for the optimal control of McKean-Vlasov dynamics, which is the primary purpose of the present work. Besides the intrinsic motivations for this type of problems, a large class of MFGs has a variational structure and can be recast in this form, see e.g. 

[23, 8]. Furthermore, the second method we present tackles the PDE system characterizing optimality conditions satisfied by the solution, and it can be directly adapted to solve the PDE system arising in MFGs as we shall explain.

In the subsequent analysis of finite horizon mean field control problems, see [19], the thrust of the study will be the numerical solution of Forward-Backward Stochastic Differential Equations (FBSDEs) of the McKean-Vlasov type. Indeed, the well established probabilistic approach to MFGs and MFC posits that the search for Nash equilibria for MFGs, as well as the search for optimal controls for MFC problems, can be reduced to the solutions of FBSDEs of this type. See for example the books [17, 18] for a comprehensive exposé of this approach. Here, we concentrate on the ergodic problem for which we can provide a direct analytic approach. Our mathematical analysis of the model leads to an infinite dimensional optimization problem for which we can identify and implement numerical schemes capable of providing stable numerical solutions. We prove the theoretical convergence of these approximation schemes and we demonstrate the efficiency of their implementations by comparing their outputs to solutions of benchmark models obtained either by analytical formulas or by a deterministic method for Partial Differential Equations (PDEs).

The reasons for our choice of the ergodic case as a prime testbed for our numerical schemes are twofold. First, the absence of the time variable lowers the complexity of the problem and gives us the opportunity to avoid the use of FBSDEs and to rely on strong approximation results from the theory of feed-forward neural networks which we use in this paper. Second, the objective function can be expressed as an integral over the state space with respect to the invariant measure of the controlled system, leading to a much simpler deterministic optimization problem. Indeed, after proving that the state dynamics at the optimum are given by a gradient diffusion, we postulate the form of the invariant measure and optimize accordingly. Last, the choice of the ergodic case for the first model which we consider is motivated by a forthcoming work on reinforcement learning 


As a final remark we emphasize that, while all the results and numerical implementations concern Markovian controls and equilibria, in most cases, both theoretical and numerical results still hold for controls and strategies being feedback functions of the history of the path of the state of the system. The theoretical extensions are straightforward, and the numerical implementations rely on the so-called recurrent neural networks instead of the standard feed-forward networks. We refrain from discussing these extensions to avoid the extra technicalities, especially in the notations and the statements of the results.

The paper is structured as follows. In Section 2, we present the framework of ergodic mean field control, derive formally necessary optimality conditions and introduce a setting amenable to numerical computations. To wit, we provide the mathematics that lead to natural introductions of the algorithms and their analyses. The first algorithm is presented in Section 3

. We study rigorously its convergence and its accuracy by proving bounds on the approximation and estimation errors (see Theorems 

4 and 8 respectively). Section 4 is dedicated to the second of our algorithms. It is based on a variation over the deep Galerkin solver for the PDE system stemming from the aforementioned optimality conditions. Computational results are presented in Section 5. They demonstrate the applicability and the performance of our algorithms. Several test cases are considered. They were chosen because they can be solved either explicitly via analytical formulas or numerically by classical PDE system solvers like in the case of ergodic mean field games.

2. Ergodic Mean Field Control

Since we are not aiming at the greatest possible generality, for the sake of definiteness, we work with a standard infinite horizon drift-controlled Itô process:

where is a -dimensional Wiener process and where we use the notation

for the law of the random variable

. We limit ourselves to stationary controls of the form given by deterministic measurable and time-independent feedback functions taking values in a closed convex subset of a Euclidean space . We shall assume that the function is measurable and bounded on , and, for notational convenience only, that . The space

is the space of probability measures on

having a finite second moment. We assume that it is equipped with the

-Wasserstein distance and the corresponding Borel -field. Since we consider controls in feedback form, the above controlled state evolution is in fact a stochastic differential equation, but according to Veretennikov’s classical result, this equation:


has a unique strong solution. See for example [32, 33] and [28, Theorem 2]. We say that the feedback control function is admissible if it is continuous and if the solution is ergodic in the sense that it has a unique invariant measure which we denote by and that converges toward in . The ergodic theory of McKean-Vlasov stochastic differential equations has recently received a lot of attention. See for example [13, 33, 34] for some specific ergodicity sufficient conditions.

2.1. Ergodic Mean Field Costs

The goal of the ergodic control problem is to minimize the cost:

For the sake of definiteness, we assume that the running cost function is continuous and bounded. The cost can be rewritten in the form:


if we use the standard notation for the integral of the function with respect to the measure . When is admissible, the invariant measure appears as the limit as of , and if is uniformly continuous in the measure argument, uniformly with respect to the other two arguments, then we can take the limit in the formula giving the ergodic cost (2) and obtain:


if, for each probability measure , and each time-independent -valued feedback function on we use the notation:


The controlled process solving (1) being ergodic, we can characterize the unique invariant probability measure as the solutions of the non-linear Poisson equation:


So the goal of our mean field control problem is to minimize, over the admissible feedback functions , the quantity:


with defined above in (4) and solving the Poisson equation (5).

2.2. The Adjoint Equation and Optimality Conditions

In order to characterize the minima of the functional , we compute its Gateaux derivative. To do so, we assume that the function is continuously differentiable in the variables and has a continuous functional (i.e. linear) differential in the variable when is viewed as an element of the (linear) space of finite signed measures on . We denote this last derivative as . We stress that it is different from the Wasserstein or L-derivative in the sense of Lions.

Let be fixed and let provide a small perturbation of . We first compute, at least formally, the derivative of the probability in the direction , namely:


when we view probability measures as elements of the space of finite (signed) measures on . Notice that since , we must have . The Poisson equation (5) implies:

and from this equality, we find that if the directional derivative (7) exists, it must solve the following Partial Differential Equation (PDE):


just by dividing both sides by and taking the limit . Note that the quantity

is merely the integral of the function with respect to the measure .

Before we turn to the objective function , we introduce the notion of adjoint equation and adjoint function.

Definition 1.

For each admissible feedback function (and associated solution of the Poisson equation), we say that the couple where is a function on the state space and is a constant, is a couple of adjoint variables if they satisfy the following linear elliptic PDE:


which we call the adjoint equation. Any solution will be denoted by .

Recall that here, the derivative

is the standard linear functional derivative (of smooth functions on the vector space

), which is a function of .

Proposition 2.

The directional derivative of the cost function defined in (6) is given by the formula:


where denotes the functional derivative with respect to in the direction , and the Hamiltonian is defined by:


Using the definitions (6) and (4) we get:

so that, using Fubini’s theorem we have:

Now, using the adjoint equation (9) and the fact that the integral of is , we get:

Finally, using (8) we get:


To complete the proof, we express this directional derivative in terms of the Hamiltonian function defined in (11). The latter can be rewritten as:

and its directional derivative is given by:


Putting together (12) and (13) we get the desired result. ∎

So, at least informally, solving the ergodic McKean-Vlasov control problem reduces to the solution of the system:


Note that the third equation guarantees the criticality of the function , so if we define the minimized Hamiltonian by:


the above system can be written as:


Both systems should be completed with appropriate boundary conditions when needed (like for example in the next subsection where we use periodic boundary conditions to analyze the system on the torus) and the following condition:

to which we can add the normalization condition:

to guarantee uniqueness for . Indeed, the above equations (16) can only determine up to an additive constant.

2.3. A Class of Models Amenable to Numerical Computations

In general, computing the invariant distribution solving (5) for a given can be costly. For example, it can be estimated by solving the PDE or by using Monte Carlo simulations for the MKV dynamics (1), see e.g. [10]. To simplify the presentation and focus on the main ingredients of the method proposed here, we shall consider a setting in which the optimal invariant distribution as well as the optimal control can both be expressed directly in terms of an adjoint variable.

From now on we assume that and:

for a constant and functions and satisfying the following assumption.

Assumption: is of class , and and are Lipschitz in both variables.

Let be a solution to the optimality system (14). In this setting, the third equation of (14) gives . Substituting in the expression for the drift and the feedback function into the state equation (1), we see that at the optimum, we are dealing with a gradient diffusion:

for the function


Accordingly, the invariant measure is necessarily of the form:

Notice that the optimal control is given by


Hence minimizing the ergodic cost (3) over controls under the constraint coming from the Poisson equation (5) can be rephrased as the problem of minimizing the functional:


over functions , where:



Notice that for all non-zero . Hence, for every constant . So if minimizes , so does for any constant . This is not an issue to guarantee that the optimal value of is close to the optimal value of the original cost , but it can lead to numerical difficulties. For this reason it is possible to add a term of the form in (19) in order to enforce a normalization condition and hence, uniqueness of the minimizer. The analysis presented below could be adapted to take into account this extra term at the expense of more cumbersome notations, so for the sake of clarity we only consider (19).

From this point on, instead of attempting to solve the system (14) numerically, we search for the right function in a family of functions parameterized by the parameter . The desired parameter minimizes the functional:


One should think of the function as a computable approximation of , allowing us to replace the minimization of the ergodic cost (2) by the minimization:


Notice that the gradient (with respect to the parameter ) of can easily be computed. It reads:

In anticipation of the set-up of next section where we consider our optimization problem on the torus, the double integral appearing in (21) can be viewed as an expectation, and its minimization is screaming for the use of the Robbins-Monro procedure. Moreover, if we use the family given by a feed-forward neural network, this minimization can be implemented efficiently with the powerful tools based on the so-called Stochastic Gradient Descent (SGD) developed for the purpose of machine learning.

3. A First Machine Learning Algorithm

We now restrict ourselves to the case of the torus for the purpose of numerical computations. The admissible feedback functions being continuous, the drift is a bounded continuous function on the torus and the controlled state process is a Markov process with infinitesimal generator:

The compactness of the state space and the uniform ellipticity of this generator guarantee that this state process is ergodic and that its invariant probability measure has a density with respect to the Riemannian measure on (which we assumed to be normalized to have total mass ). Note that, because the torus does not have a boundary, the integration by parts which we used freely in the computations of the above subsection are fully justified in the present situation.

We introduce new notation to define the class of functions which we use for numerical approximation purposes. We denote by:

the set of layer functions with input dimension , output dimension

, and activation function

. Here, denotes the composition of functions. Building on this notation we define:

the set of regression neural networks with hidden layers and one output layer, the activation function of the output layer being the identity . Note that as a general rule, we shall not use the superscript in that case. The number of hidden layers, the numbers , , , of units per layer, and the activation functions (one single function in the present situation), are what is usually called the architecture of the network. Once it is fixed, the actual network function is determined by the remaining parameters:

defining the functions , , , and respectively. Their set is denoted by . For each , the function computed by the network will be denoted by . As it should be clear from the discussion of the previous section, here, we are interested in the case where and .

Our analysis is based on the following algorithm. In practice, instead of having a fixed number of iterations , one can use a criterion of the form: at iteration , if is small enough, stop; otherwise continue.

Data: An initial parameter . A sequence of learning rates.
Result: Parameters such that approximates
1 begin
2       for  do
3             Pick where and are picked i.i.d. uniform in Compute the gradient of Set
4      return
Algorithm 1 SGD for ergodic MFC

3.1. The Approximation Estimates

Our goal is now to analyze the error made by the numerical procedure described in Algorithm 1. We split the error into two parts: the approximation error and the estimation error (or generalization error). The approximation error quantifies the error made by shrinking the class of admissible controls (here we use neural networks of a certain architecture instead of all possible feedback controls). The estimation error quantifies the error made by replacing the integrals by averages over a finite number of Monte Carlo samples.

In this section, denotes a periodic activation function of class whose Fourier expansion contains , i.e.,


More general activation functions (such as the hyperbolic tangent) could probably be considered at the expense of additional technicalities. The choice of this class of activation functions is motivated by the fact that we will use it to build a neural network which can approximate a periodic function together with its first order derivatives (namely, and ).

Approximation Error. The proof of our first estimate is based on the following special case of [27, Theorems 2.3 and 6.1].111We use a special case of the neural networks considered in [27]. For us, using the notation of Mhaskar and Micchelli, and for all . We state it for the sake of completeness. It provides a neural network approximation for a function and its derivative. For positive integers and , and a function , let denote the trigonometric degree of approximation of defined by:

where the infimum is over trigonometric polynomials of degree at most in each of its variables.

Theorem 3 (Theorems 2.3 and 6.1 in [27]).

Let be of class , and let and be positive integers. Then there exist and such that:


where depends on the activation function through but does not depend on .

The workhorse of our control of the approximation error is the following.

Theorem 4.

Assume that for some integer , there exists a minimizer over , say , of the cost function defined in (19). Assume that . Then, for large enough we have:


The constants in the above big Bachmann - Landau term depend only on the data of the problem as well as , , and the norms of the partial derivatives of and of order up to (but they do not depend upon ).

Remark 5.

The exponent in the

term in the statement of the proposition is what is blamed for the so-called curse of dimensionality. In some settings, the constants in the

term can be estimated if bounds on the norms of the partial derivatives of of order up to are known, for instance from a priori estimates on the solution of the PDE system (16).

Proof of Theorem 4.

The first inequality holds by definition of and because . In order to apply the result of Theorem 3 to , we bound from above the right hand sides of (23) and (24). We use the fact that the trigonometric degree of approximation of a function of class is of order when using polynomials of degree at most . More precisely, by [30, Theorem 4.3], if is an -times continuously differentiable function which is -periodic in each variable, then for every positive integer , there exists a trigonometric polynomial of degree at most such that

where depends only on and on the bounds on the -th derivatives of in each direction: , . We apply this result, with some integer to be specified later, to and with , since is of class . By [30, Theorem 4.3] again, since and are both of class , we obtain that for any integer there exist trigonometric polynomials of degree at most such that

where depends only on and on the bounds on the -th derivatives of and , namely . We apply this result with . Note that .

So by Theorem 3, we obtain that there exists and such that


where the constant depends on , , , but not on or . This implies in particular (since ) that

Notice that, since , the number of units in the hidden layer is , hence the right hand side in (25) is of order . In other words,


where the constant in the big might depend on but is independent of .

Going back to the definition (19) of , we note that, by (25), for all , and both lie in the interval . Since is Lipschitz continuous on this interval with a Lipschitz constant depending only on , we obtain that:

where here and thereafter denotes a generic constant which depends on the data of the problem as well as , and bounds on the norms of the partial derivatives of and up to order , and whose exact value might change from line to line. Moreover, lie in the interval , and is Lipschitz continuous on this interval with a Lipschitz constant depending only on so we also have:

Hence, recalling the definition (20) of , one can check after some calculations that for all ,

From the definition (19) of , the considerations above and (26), we deduce that: