Random Features for High-Dimensional Nonlocal Mean-Field Games

02/25/2022
by   Sudhanshu Agrawal, et al.
mines
0

We propose an efficient solution approach for high-dimensional nonlocal mean-field game (MFG) systems based on the Monte Carlo approximation of interaction kernels via random features. We avoid costly space-discretizations of interaction terms in the state-space by passing to the feature-space. This approach allows for a seamless mean-field extension of virtually any single-agent trajectory optimization algorithm. Here, we extend the direct transcription approach in optimal control to the mean-field setting. We demonstrate the efficiency of our method by solving MFG problems in high-dimensional spaces which were previously out of reach for conventional non-deep-learning techniques.

READ FULL TEXT VIEW PDF

Authors

page 11

page 14

11/29/2021

Learning Graphon Mean Field Games and Approximate Nash Equilibria

Recent advances at the intersection of dense large graph limits and mean...
02/15/2020

A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems

Mean field games (MFG) and mean field control (MFC) are critical classes...
04/25/2022

Learning High-Dimensional McKean-Vlasov Forward-Backward Stochastic Differential Equations with General Distribution Dependence

One of the core problems in mean-field control and mean-field games is t...
06/30/2022

Bridging Mean-Field Games and Normalizing Flows with Trajectory Regularization

Mean-field games (MFGs) are a modeling framework for systems with a larg...
06/06/2021

Signatured Deep Fictitious Play for Mean Field Games with Common Noise

Existing deep learning methods for solving mean-field games (MFGs) with ...
12/14/2021

The high-dimensional asymptotics of first order methods with random data

We study a class of deterministic flows in ℝ^d× k, parametrized by a ran...
02/04/2021

Mean-field control variate methods for kinetic equations with uncertainties and applications to socio-economic sciences

In this paper, we extend a recently introduced multi-fidelity control va...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We propose a computational framework for solving mean-field game (MFG) systems of the form

(1)

based on random features from kernel machines. The partial differential equation (PDE) above describes an equilibrium configuration of a noncooperative differential game with a continuum of agents. An individual agent faces a cost

(2)

where the Lagrangian (running cost) and the Hamiltonian are related by the Legendre transform:

(3)

Furthermore, represents the distribution of all agents in the state-space at time , and the term

(4)

models the influence of the population on an individual agent. Finally, in 2 represents the terminal cost paid by agents at terminal time , and is the initial distribution of the population. Note that 4 assumes a nonlocal interaction of an individual agent with the population. If, for instance, we had

then the interaction would be local. In this paper, we only consider nonlocal interactions in 4.

In an equilibrium, individual agents cannot unilaterally improve their costs based on their belief about the state-space distribution of the population. This Nash equilibrium principle leads to the Hamilton-Jacobi-Bellman (HJB) PDE in 1. Furthermore, the evolution of the state-space distribution of the population corresponding to their optimal actions must coincide with their belief about population distribution. This consistency principle leads to the continuity equation in 1.

The MFG framework, introduced by M. Huang, P. Caines, R. Malhamé HCM06; HCM07 and P.-L. Lions, J.-M. Lasry LasryLions06a; LasryLions06b; LasryLions2007, is currently an active field with applications in economics moll14; moll17; gueant2011mean; gomes2015economic, finance caines17; cardialiaguet2018; jaimungal19; moll17, industrial engineering paola19; kizikale19; gomes2018electricity, swarm robotics liu2018mean; elamvazhuthi2019mean; kang21jointsensing; kang21taskselection, epidemic modelling lee2020controlling; chang2020game

and data science 

weinan2019mean; guo2019learning; carmona2019linear. For comprehensive exposition of MFG systems we refer to LasryLions2007; CardaNotes; mfgCIME for nonlocal couplings, gomes_book16; cirant19; mfgCIME for local couplings, delarue18a; delarue18b for a probabilistic approach, bensoussan2013 for infinite-dimensional control approach, carda19; gangbo2021mean for the master equation, and achdou:hal-03408825 for the control on the acceleration. For the mathematical analysis of 1 we refer to LasryLions2007; CardaNotes; mfgCIME.

In this paper, we develop a computational method for 1 based on kernel expansion framework introduced in nurbekyan18; nursaude18; liu2020computational; liu2020splitting. The key idea is to build an approximation

(5)

where and are suitably chosen basis functions and expansions coefficients, and consider an approximate system

(6)

The structure of allows for an efficient discretization of the interaction term in the feature space. Indeed, introducing unknown coefficients we can rewrite 6 as

(7)

where , and is the viscosity solution of

(8)

We provide a formal derivation of the equivalence between 6 and 7 in the Appendix and refer to nursaude18 for more details. When is symmetric, 7 reduces to an optimization problem

(9)

The key advantage in our approach is that contain all information about the population interaction, and there is no need for a costly space discretization of in 4. Indeed, the approximation in 5 yields an approximation of the interaction operator

that is independent of the space-discretization. Moreover, for fixed , the computational cost of calculating the approximate interaction term in space-time is , where is the time-discretization, and is the space-discretization or number of trajectories or agents in the Lagrangian setting. In contrast, direct calculation of the interaction term yields an computational cost. This dimension reduction provides a significant computational gain when is moderate.

There is a complete flexibility in the choice of basis functions . In nursaude18, the authors considered problems in periodic domains and used classical trigonometric polynomials. Furthermore, in liu2020computational; liu2020splitting

the authors drew connections with kernel methods in machine learning and used polynomial and quasi-polynomial features for

.

Our key contribution is to build on the connection with kernel methods in machine learning and construct using random features RahimiR07. The advantage of using random features for a suitable class of kernels is the simplicity and speed of the generation of basis functions, including in high dimensions. Moreover, in 5

reduces to an identity matrix which renders extremely simple update rules for

in iterative solvers of 7 and 9.

We demonstrate the efficiency of our approach by solving crowd-motion-type MFG problems in up to dimensions. To the best of our knowledge, this is the first instance such high-dimensional MFG are solved without deep learning techniques. Our algorithm is inspired by the primal-dual algorithm in nursaude18, except that here we use random features instead of trigonometric polynomials. The primal step consists of trajectory optimization, whereas the dual step updates nonlocal variables . Modeling nonlocal interactions by decouples primal updates for the agents, which would not be possible using a direct discretization of the interaction term. Hence, one can take advantage of parallelization techniques within primal updates. We refer to Section 4 for more details.

For related work on numerical methods for nonlocal MFG we refer to hadi17; hadi17b; hadi19; bonnans2021generalized for game theoretic approach, silva12; silva14; silva15; carlini18 for semi-Lagrangian schemes, lin2021alternating for deep learning approach, and li2021multiscale for a multiscale method. In all of these methods the nonlocal terms are discretized directly in the state-space. Finally, for a comprehensive exposition of numerical methods for other types of MFG systems we refer to mfgCIME.

The rest of the paper is organized as follows. In Section 2 we present the kernel expansion framework. Next, in Section 3, we show how to construct basis functions based on random features. Section 4 contains the description of our algorithm. Finally, we present numerical results in Section 5. We provide an implementation111code can be found in https://github.com/SudhanshuAgrawal27/HighDimNonlocalMFG written in the Julia language bezanson2017julia.

2 The method of coefficients

One can adapt the results in nursaude18 to the non-periodic setting relying on the analysis in CardaNotes and prove the following theorem.

Theorem 2.1.

A pair is a solution for the MFG 6 if and only if there exist such that 7 holds. Moreover, when is symmetric, 7 reduces to 9. Finally, when is positive-definite 9 is a convex program.

Next, we need a formula to calculate the gradient of the objective function in 9. Again, adapting results in nursaude18 to the non-periodic setting one can prove the following theorem.

Theorem 2.2.

The functional is convex and Fréchet differentiable everywhere. Moreover,

(10)

where is an optimal trajectory for the optimal control problem

(11)

We do not specify precise assumptions on the data in these previous theorems and refer to nursaude18; CardaNotes for more details since the theoretical analysis of 1 and 6 is out of the scope of the current paper. Nevertheless, these theorems are valid for typical choices such as

where is a positive-definite matrix, are smooth and bounded below, and

is a compactly supported absolutely continuous probability measure with bounded a density. In particular,

is unique for Lebesgue a.e. , and one can choose in such a way that is Borel measurable.

Utilizing the value-function representation 11 of , we obtain the following saddle-point formluation of 9:

(12)

This saddle-point formulation is the basis of our algorithm in Section 4.

3 Random Features

Random features is a simple yet powerful technique to approximate translation invariant positive definite kernels RahimiR07. The foundation of the method is Bochner’s theorem from harmonic analysis.

Theorem 3.1 (Bochner rudin1962fourier).

A continuous symmetric shift-invariant kernel on is positive definite if and only if

is the Fourier transform of a non-negative measure.

Thus, if

is a continuous symmetric positive definite kernel there exists a probability distribution

such that

(13)

Hence, we can approximate by sampling iid from :

(14)

where

(15)

Note that this approximation is also shift-invariant, which is a significant advantage for crowd-motion type models where agents interact through their relative positions in the state space. Furthermore, in 5 is the identity matrix, which leads to simple update rules for nonlocal variables : see Section 4.

The approximation above is viable if one can efficiently sample from . In this paper, we consider Gaussian no-collision repulsive kernels similar to those in onken2021multi; onken2021neural:

(16)

In this case, one can easily sample from

because it is a Gaussian normal distribution:

(17)

4 Trajectory Generation

Here, we propose a primal-dual algorithm inspired by nursaude18 to solve 12. Note that the part of 12 is a classical optimal control or trajectory optimization problem where the dual variable acts as a parameter. Thus, we successively optimize trajectories and update the dual variable.

While there exist many trajectory optimization methods for 12 nakamura2019adaptive; parkinson2020model; ruthotto2020machine; onken2021neural; onken2021ot; onken2021neural, we use the direct transcription approach for simplicity enright1992discrete. The direct transcription approximates the solution to 12 by discretizing the trajectories over time using, for instance, Euler’s Method for the ODE and a midpoint rule to discretize the time integral. Consider a uniform time discretization , and denote the discretized states by and the discretized dual variables by . The direct transcription approach solves the discretized problem given by

(18)

where is the discretized control, and

Thus, is the value of for the initial condition and control . Here, are samples of initial conditions drawn from . The inner sup problem occurs over the discretized controls

(19)

where each row represents the controls for the trajectory defined by initial condition . The outer inf problem occurs over the discretized coefficients

(20)

Indeed, any optimization algorithm can be used to solve this problem. While we used an Euler discretization of the dynamics, any other method could also be used, e.g., RK4. As in nursaude18, we use a version of primal-dual hybrid gradient (PDHG) algorithm champock11 to approximate the solution to 18. Denoting by

18 reduces to

and the algorithm successively performs the updates

(21)

where are suitably chosen time-steps, and are chosen randomly.

Remark 4.1.

In the original PDHG of Chambolle and Pock champock11 the coupling between is bilinear, is concave in , and the gradient ascent step in is replaced by a proximal step. Despite these differences, 21 has a reliable performance.

The gradient ascent step in is implemented via back-propagation, whereas the proximal step in admits a closed-form solution

Note that for the updates of and are decoupled within the update because the coupling variable is fixed within this update. Furthermore, the random-features approximation yields , which leads to extremely simple proximal updates for :

5 Numerical Experiments

We discuss several numerical examples to demonstrate the efficiency and robustness of our algorithm. The experiments are organized in three groups, A, B, and C, which are presented in Sections 5.1, 5.2, and 5.3, respectively. In experiments A and B we consider high-dimensional problems with low-dimensional interactions - this setting is realistic in the physical setting, e.g., controlling swarm UAVs, since it is often the case that one may have a high-dimensional state/control but the interaction only occurs in the spatial dimensions. In experiment C we consider high-dimensional problems with high-dimensional interactions. The experiments are performed in dimensions with a fixed time horizon .

5.1 Experiment A

We assume that agents are initially distributed according to a mixture of eight Gaussian distributions centered at the vertices of a regular planar octagon. More precisely, we suppose that

(22)

where

Furthermore, we assume that the interaction kernel has the form

(23)

where for . Such kernels are repulsive, where is the repulsion intensity, and is the repulsion radius. Thus, larger leads to more crowd averse agents. Furthermore, the smaller the more sensitive are the agents to their immediate neighbors. Hence, can also be interpreted as a safety radius for collision-avoidance applications onken2021multi; onken2021neural. For experiments in A we take , and .

The random-features approximation of is given by

where

and are drawn randomly from

We plot the convergence of approximate kernels to the true one in Figures 0(a) and 1(a) for and , respectively. This is done by comparing the values generated by the true and approximate kernels , in and norms for on a 2-dimensional grid centred at the origin. Further, in Figures 0(b), 0(c) and Figures 1(b), 1(c), we visually compare the approximation to the true kernel on this grid. In experiments A, we choose for both values of .

(a) Convergence of errors.
(b) True Kernel
(c) Approximate kernel with
features.
Figure 1: Kernel approximation for .
(a) Convergence of errors
(b) True Kernel
(c) Approximate kernel with
features.
Figure 2: Kernel approximation for .

We take the Lagrangian and terminal cost functions

where . This choice corresponds to a model where crowd-averse agents travel from initial positions towards a target location, . Finally, we sample initial positions from .

In Figure 3 we plot the projections of agents’ trajectories on the first two dimensions when the repulsion radius is and . Analogously, we plot the agents trajectories for and in Figure 4. Note that trajectories split more when , which corresponds to the case when agents are more sensitive to their immediate neighbors. Additionally, note that the terminal cost function enforces agents to reach the destination . The 3D trajectories are plotted in Figure 5.

(a) .
(b) .
(c) .
Figure 3: Agents’ trajectories in experiments A for plotted on the first two dimensions. Agents move from 8 Gaussian distributions (colored red) to the target point (). Each plot shows the trajectories solved in different dimensions: (a) , (b) , (c) .
.
(a) .
(b) .
Figure 4: Agents’ trajectories in experiments A for plotted on the first two dimensions. Agents move from 8 Gaussian distributions (colored red) to the target point (). Each plot shows the trajectories solved in different dimensions: (a) , (b) , (c) .
(a) 3D view.
(b) XZ view.
(c) YZ view.
(d) Top view.
Figure 5: 3D plots of agents’ trajectories in experiments A with low-dimensional interactions (the first two dimensions) for , . The plots show the first three dimensions of the trajectories. Each agent starts from (colored blue) to (colored yellow). The plots are from four different viewpoints. (a): 3D view, (b), (c): side views (XZ view and YZ view), (d): top view (XY view). The interactions of agents are only across the first two dimensions. Thus, while the agents spread in XY axis (see (d) for the top view), they move to the target point almost linearly in other axis (see (b) and (c) for the side views).

In Table 1 we report the population running cost

interaction cost

terminal cost

and the total cost at the equilibrium.

Running Interaction Terminal Total
2 0.2 0.526 0.465 0.0108 1.10
2 1.25 0.621 3.57 0.00997 4.29
50 0.2 0.754 0.454 0.0116 1.32
50 1.25 0.825 3.58 0.0109 4.51
100 0.2 0.992 0.533 0.0140 1.67
100 1.25 1.11 3.26 0.0139 4.51
Table 1: Running, interaction, terminal, and total costs in experiments A.

5.2 Experiment B

In this set of experiments we assume that agents are initially distributed according to

where . Furthermore, we assume that the Lagrangian and terminal cost functions are

for , where .

As before, we consider low-dimensional interactions with a kernel of the form 23. We take and . The approximation error and the approximate kernel for are plotted in Figure 6. As before, the plots are generated by evaluating the true kernel and the approximate kernel at points on a 2-dimensional grid.

(a) Convergence of errors.
(b) True Kernel
(c) Approximate kernel with
features.
Figure 6: Kernel approximation for .

Thus, in experiments B we model a crowd-averse population that travels from around an initial point, , to a target point, , avoiding wedge shaped obstacles. The projections of agents’ trajectories on the first two dimensions are plotted in Figure 7.

(a) .
(b) .
(c) .
Figure 7: Agents’ trajectories in experiments B plotted on the first two dimensions. Agents move from the initial distribution (near ) to the target point () while avoiding the obstacle (colored red). Each plot shows the trajectories solved in different dimensions: (a) , (b) , (c) .

Note that the trajectories split at close to the initial and target points, demonstrating the crowd-averse behavior of the agents. On the other hand, obstacles force the agents to converge at the bottleneck.

We plot the 3D trajectories in Figure 8 and report running, interaction, terminal, and total costs in Table 2.

(a) 3D view.
(b) XZ view.
(c) YZ view.
(d) Top view.
Figure 8: 3D plots of Agents’ trajectories with low-dimensional interactions in experiments B for . The plots show the first three dimensions of the trajectories. Each agent starts from (colored blue) to (colored yellow) while avoiding the obstacle (see Figure 7). The plots are from four different viewpoints. (a): 3D view (the target point is at the lower-left side of the plot and the initial distributions are at the top-right side of the plot), (b), (c): side views (XZ view and YZ view), (d): top view (XY view). The interactions of agents are only across the first two dimensions. Thus, while the agents spread in XY axis (see (d) for the top view), they move to the target point almost linearly in other axis (see (c) for the side view).
Running Interaction Terminal Total
2 3.72 12.2 0.388 16.3
50 2.63 15.4 0.533 18.6
100 2.86 14.8 0.567 18.3
Table 2: Running, interaction, terminal, and total costs in experiments B.

5.3 Experiment C

In experiments A, B we consider high-dimensional problems with low-dimensional interactions. Here, we perform experiments similar to A but with full-dimensional interactions to demonstrate the efficiency of our method for higher-dimensional interactions as well.

Thus, we assume that we are in the same setup as in A with the only difference that is a full-dimensional interaction 16 with , and and for and , respectively. Here, is a dimensionless repulsion radius. Indeed, since for in 22

the variance of constituent Gaussians is the same across dimensions, the average distance between agents scales with a factor

near the centers of these Gaussians. Hence, if we used the same repulsion radius across all dimensions, the effective interaction would be different, and it would be hard to interpret the results. By fixing a repulsion radius for and scaling it accordingly we make sure that the effective interaction is the same across all dimensions, and we should obtain similar equilibrium behavior.

The results for and are plotted in Figures 9 and 10, respectively.

(a) .
(b) .
Figure 9: Agents’ trajectories in experiments C for plotted on the first two dimensions. Agents move from 8 Gaussian distributions (colored red) to the target point (). Each plot shows the trajectories solved in different dimensions: (a) , (b) .
(a) .
(b) .
Figure 10: Agents’ trajectories in experiments C for plotted on the first two dimensions. Agents move from 8 Gaussian distributions (colored red) to the target point (). Each plot shows the trajectories solved in different dimensions: (a) , (b) .

Note that the trajectories are almost straight lines when . In Figures 11, 12, 13 we plot the original and approximate kernels to explain this phenomenon. More specifically, in Figures 10(a) and 11(a) we plot and along a random direction so that . Furthermore, in Figures 10(b) and 11(b) we plot the decay of the approximation error in and norms for sampled according to , where is the center of one of the eight constituent Gaussians of . Finally, we superimpose Figures 10(a) and 11(a) in Figure 13.

As we can see in Figures 11(a) and 13, the interaction kernel is almost flat for within the support of . Hence, the interaction cost is approximately the same for all agents, which effectively decouples the agents leading to individual control problems with a purely quadratic cost. In the latter case, optimal trajectories are straight lines as follows from the Hopf-Lax theory (evansPDE, Section 3.3).

We plot the 3D trajectories in Figure 14 and report running, interaction, terminal, and total costs in Table 3.

(a) Kernel values along a random direction
(b) Convergence of errors
Figure 11: Kernel approximation for .
(a) Kernel values along a random direction
(b) Convergence of errors
Figure 12: Kernel approximation for .
Figure 13: Kernel values along a random ray for
running interaction terminal total
50 0.2 10 1.05 1.96 0.0192 3.20
50 1.25 1 0.674 0.492 0.00340 1.20
100 0.2 10 1.21 2.59 0.0177 3.97
100 1.25 1 0.912 0.495 0.00458 1.45
Table 3: Running, interaction, terminal, and total costs in experiments C.
(a) 3D view.
(b) XZ view.
(c) YZ view.
(d) Top view.
Figure 14: 3D plots of Agents’ trajectories in experiments C with full-dimensional interactions for , . The plots show the first three dimensions of the trajectories. Each agent starts from (colored blue) to (colored yellow). The plots are from four different viewpoints. (a): 3D view, (b), (c): side views (XZ view and YZ view), (d): top view (XY view). Because of full-dimensional interactions, the spread of agents’ trajectories can be observed from every viewpoints.

6 Conclusion

We propose an efficient solution approach for high-dimensional nonlocal MFG systems utilizing random-feature expansions of interaction kernels. We thus bypass the costly state space discretizations of interaction terms and allow for straightforward extensions of virtually any single-agent trajectory optimization algorithm to the mean-field setting. As an example, we extend the direct transcription approach in optimal control to the mean-field setting. Our numerical results demonstrate the efficiency of our method by solving MFG problems in up to a hundred-dimensional state space. To the best of our knowledge, this is the first instance of solving such high-dimensional problems with non-deep-learning techniques.

Future work involves the extension of our method to affine controls arising in, e.g., quadrotors carrillo2013modeling, as well as alternative trajectory generation methods that involve deep learning ruthotto2020machine; lin2021alternating.

Compact feature space representations of interaction kernels are also valuable for inverse problems. In a forthcoming paper liuetal21, we recover the interaction kernel from data by postulating its feature space expansion.

Furthermore, note that feature space expansions of the kernel are not related to the mean-field idealization. Thus, we plan to investigate applications of our method to possibly heterogeneous multi-agent problems where the number of agents is not large enough for the mean-field approximation to be valid onken2021multi; onken2021neural.

Finally, an interesting and challenging question is the convergence analysis of the primal-dual algorithm 21 described in Section 4. We anticipate analysis methods developed in hadi17b to be useful for this question.

Acknowledgments

Wonjun Lee, Levon Nurbekyan, and Samy Wu Fung were partially funded by AFOSR MURI FA9550-18-502, ONR N00014-18-1-2527, N00014-18-20-1-2093, and N00014-20-1-2787.

Appendix

Derivation of 7.

Assume that

is a smooth vector field. For every

denote by the solution of the ODE

(24)

If agents are distributed according to at time and follow the flow in 24, their distribution, , satisfies the continuity equation

Now assume that is the solution of 8. From the optimal control theory we have that

where equality holds for given by

(25)

Summarizing, we obtain that

(26)

where the equality holds for in 25. Applying perturbation analysis for optimization problems (bonnans00, Proposition 4.12) we obtain

(27)

where is the solution of 24 for the optimal control .

Now we are in the position for proving the equivalence between 6 and 7. We have that

(28)

where

(29)

Therefore, solve 6 if and only if for satisfying 29. Furthermore, 27 yields that 29 is precisely equivalent to

which leads to 7. ∎

References