The classical Dynamic Programming (DP) approach to optimal control problems is based on the characterization of the value function as the unique viscosity solution of a Hamilton-Jacobi-Bellman (HJB) equation. The DP scheme for the numerical approximation of viscosity solutions of Bellman equations is typically based on a time discretization which is projected on a fixed state-space grid. The time discretization can be done by a one-step scheme for the dynamics and the projection on the grid typically uses a local interpolation. Clearly, the use of a grid is a limitation with respect to possible applications in high-dimensional problems due to the curse of dimensionality. We refer to[5, 14] for theoretical results and numerical methods.
In , the HJB equation has been approximated by means of Shepard’s moving least square approximation providing error estimates and convergence results. Numerical tests for a selected kernel and structured meshes have been provided for the control of low dimensional problems. A Kriging’s interpolation was introduced in  with Halton meshes up to dimension .
Our work is based on  with the aim to extend the method to high dimensional problems, e.g. control of PDEs which discretized dimension is, at least, of order . We have investigated this approach on scattered meshes since the use of structured meshes is impossible for such high dimensional problems, and thus we resort to a grid driven by the dynamics of the problem, similarly to what is done in  where a tree structure has been proposed to approximate the finite horizon optimal control problem and the time dependent HJB equation. The generation of the grid driven by the dynamics will help us to compute feedback controls for a class of initial conditions which is an important novelty in the field.
Within the RBF approximant, the determination of the kernel’s shape parameter is known to be a crucial choice. Several techniques exists in the literature to optimize this parameter (see e.g. Chapter 14 in  for a recent survey), but they are usually based on the minimization of the approximation error. We introduce instead a novel technique based on the residual of the HJB equation. Specifically, even if the reconstruction of the Value function requires the computation of several Shepard approximant, a unique parameter is globally optimized based on the residual of the Value iteration scheme.
The use of the residual in HJB has been introduced in  and also used in the context of RBF and HJB to obtain an adaptive grid . Here, the residual is used as quantity of interest to select the shape parameter. We summarize the contributions of our paper and the novelty of our approach:
generation of an unstructured mesh driven by the dynamics that helps us to localize the problem.
An optimized way to select the shape parameter minimizing the residual of the Value Iteration method.
The possibility to compute feedback control of PDEs for a class of initial conditions.
An error estimate of the reconstruction process that is valid along trajectories of the dynamics.
Let us briefly comment on related literature on the control of PDEs using a dynamic programming approach. In the last two decades there has been a tremendous effort to mitigate the curse of dimensionality with the goal to control PDEs. It is straightforward to see that the discretization of a PDE leads to a large system of ODEs which is hard to control using a DP approach.
The first approach is related to the combination of model order reduction techniques and dynamic programming. The idea is to use a projection technique, e.g. Proper Orthogonal Decomposition () to reduce the dimension of the dynamical systems. Then, we can approximate with standard Value Iteration algorithm the correspondent reduced Hamilton-Jacobi-Bellman equation. This turns out to be efficient if the reduction is up to dimension or . We refer to the pioneer work  and to  for error estimates of the method. A different way to reduce the dimension of the dynamical system is given by pseudo-spectral methods as shown in . Recently, the solution of Hamilton-Jacobi-Bellman on a tree structure has been proposed in  and its coupling with model order reduction in .
, tensor train decomposition11, 12] and max-plus algebra [26, 25]. All these approaches are important contributions on the mitigation of the curse of dimensionality.
The outline of this paper is as follows. In Section 2 we recall the background of the dynamic programming approach. Section 3 concerns the radial basis functions interpolation and Shepard’s approximation. In Section 4, we propose the novel method for the coupling between RBF and DP. We will discuss and comment the details of our approach. Numerical simulations for the control of two different PDEs are shown in Section 5. Finally, conclusions and future works are driven in Section 6.
2 Dynamic Programming Equations
Let the dynamical system described by
where is the state variable, and the control is such that with the set of admissible controls and is a compact set. The function is Lipschitz continuous with respect to the first variable with constant . Under such hypothesis, the existence of a unique solution to system (2.1) holds true (see ).
To select the unknown control , we define the following cost functional
The function is the running cost and it is assumed to be bounded and Lipschitz continuous in the first variable. The constant is a discount factor and the term guarantees the convergence of the integral for bounded.
Our optimal control problem reads:
with being a trajectory that solves (2.1) corresponding to the initial point and control . We will use the index in our notations, to stress the dependence on the initial condition. We aim at obtaining the control in a feedback form and, for this reason, we define the Value function as follows:
One can characterize the value function in terms of the Dynamical Programming Principle (DPP):
From (2.5), we derive the Hamilton-Jacobi-Bellman (HJB) equations corresponding to the infinite horizon problem:
where is the gradient of . The HJB is a further characterization of the value function by means of a degenerate PDEs whose solution has to be thought in the viscous sense . Thus, if one is able to solve (2.6), it is possible to obtain the optimal feedback control :
3 RBF and Shepard’s method
This section presents a brief explanation of Radial Basis Functions (RBFs) and the Shepard’s approximation method. Most of the following material is based on the book  for the general RBF theory, and on the paper  for the Shepard method, to which we refer for further details.
An RBF is a radially invariant function , which is usually required to be positive definite. Several instances of these exists (see e.g. ). In this paper we will consider only radially compactly supported RBFs. A significant example are the Wendland RBFs, which are a family of strictly positive definite RBFs of compact support and of different smoothness (depending on a parameter) (see ). The radial nature of these bases can be used to tune their spread by means of a shape parameter , i.e., usually one works with a basis . Figure 1 shows the Wendland RBF
with on the left panel and on the right panel, and we can see how the parameters influence the shape of the basis functions and makes them flat (left) or spiky (right). The RBF is scaled so that
RBFs can be used as a tool in interpolation and approximations methods in a mesh-free environment. For this we consider a continuous function with and a set of pairwise distinct interpolation points (or nodes) and the corresponding function evaluations. In this paper we use RBFs in a moving-least squares mode within a Shepard approximation scheme (see e.g. Chapter 23 in ). In this case, the RBF bases are used to form weights
and the Shepard approximant is formed as
Observe that each is compactly supported in and non negative, and the weights form a partition of unity, i.e., for all with
This implies that
is actually a convex combination of the function values. Moreover, the compact support of the weights leads to a computational advantage and a localization. In particular, the Shepard weights are evaluated by constructing a distance vectorwith by computing only the entries such that . This operation can be implemented by a range search.
An additional advantage of the Shepard method is that the the construction of the approximant (3.3) can be directly obtained from the function values and the evaluation of the weights, without solving any linear system.
As RBF-based methods work with unstructured meshes, to obtain error estimates in this context it is common to consider the fill distance and the separation distance
The fill distance replaces the mesh size and it is the radius of the largest ball in which does not contain any point from , and it gives a quantification of the well spread of the approximation nodes in the domain. On the other hand, the separation distance quantifies the minimal separation between different approximation points. We remark that for any sequence of points it holds , but the inverse inequality ,
, does not hold unless the points are asymptotically uniformly distributed.
4 The coupling between DP and RBF
To overcome the difficulty of solving analitically equation (2.6), in this section we introduce the core approach of our work. We first recall the semi-Lagrange discretization of (2.6) by means of Shepard interpolation, then we propose a method to generate unstructured meshes, we define a strategy to select the shape parameter, derive error estimates of the overall approximation method and, finally, we summarize everything in one algorithm.
We remark that for the purpose of numerical computation we consider only finite horizons with a given large enough to simulate the infinite horizon problem. We also assume that the dynamics evolve for each initial value and control parameter within a compact set .
Semi-Lagrangian scheme for (2.6)
We first chose a temporal step size and build a grid in time such that with . We will discuss in the following how to define a spatial discretization, and for now we just denote it as . Furthermore, the set is also discretized by setting the control for constant in the considered time interval. To introduce the approximation of the value function, we represent the Shepard approximant as an operator
where as in (3.2). We remark that the Shepard approximation uses as interpolation nodes the same points defined above. For now we make no special assumption on its value, while its selection will be discussed later in the section.
We aim at the reconstruction of the vector where is the approximate value for for each . The full discretization of equation (2.6) is obtained starting from a classical approach (see e.g. ), but replacing as in  the local linear interpolation operator on a structured grid with the Shepard approximation operator. This discretization reads
To compute the set is discretized in a finite number of points , and the minimum is computed by comparison. The full approximation scheme of the Value function is known as the Value Iteration (VI) method, and it is obtained by iteration of (4.2), i.e.,
To ensure convergence of the scheme it is necessary that is a contraction. In this context, the Shepard operator offers another striking benefit in comparison with plain RBF interpolation. Indeed, in (4.1) has unit norm, and this implies that the right hand side of (4.2) is a contraction if (see  for the details, and especially Lemma 2). Therefore, the convergence of the value iteration scheme is guaranteed.
As soon as we obtain an approximation of the value function, we can compute an approximation of the feedback control as
with . Thus, we are able to perform a reconstruction of an optimal trajectory and optimal control .
Under the assumption that the fill distance decays to zero and that the shape parameter scales as , in  it is proven that this approximation scheme converges. More precisely, under suitable assumptions on and Theorem 3 in  guarantees that , where depends on the dynamics but not on the discretization.
Despite these convincing theoretical guarantees, the requirement that decays to zero is too restrictive in our setting, since a filling of the entire may be out of reach for high dimensional problems. Moreover, as already mentioned in Section 3, Shepard method perform approximations in high dimensions and unstructured grids, while in  the authors focused on a given configuration for the shape parameter and an equidistant grid. In the next paragraphs we will explain how to select the shape parameter and to generate unstructured meshes to solve high dimensional problems.
On the Scattered Mesh
Different possibilities are available for the definition of the discretization of the spatial domain. A standard choice is to use an equi-distributed grid, which covers the entire space and usually provides accurate results for interpolation problems. Unfortunately, for higher dimensional problems it is impossible to think to work on equi-distributed grid, as their size grows exponentially. This is a particular limitation in our case, since our goal is to control PDEs, whose discretization leads to high dimensional problems e.g. . On the other hand, a random set of points is computationally efficient to generate and to use, but in this case additional care should be taken because the distribution of points can be irregular (some regions can be more densely populated than others) and the fill distance may decrease only very slowly when increasing the number of points.
In general terms, there is a tradeoff between keeping the grid at a reasonable size and the need to cover the relevant part of the computational domain. In particular, it is well known that the fill distance for any sequence of points can at most decrease as in for a suitable constant depending only on the geometry of the domain. Observe that uniform points have precisely this asymptotic decay of the fill distance. Thus, an exponentially growing number of points is required to obtain a good covering of as increases.
The key point to overcome these limitations is to observe that the evolution of the system provides itself an indication of the regions of interest within the domain. Following this idea, we propose a discretization method driven by the dynamics of the control problem (2.1). Observe that a similar idea has been used in  to compute the value function along the trajectories of open loop control problems. Also in  the grid has been generated by points of the dynamics leading to the solution of the HJB equation on a tree structure.
To define our dynamics-dependent grid we fix a time step , a maximum number of discrete times and, for , some initial conditions of interest and a discretization of the control space, i.e.,
Observe that all these parameters do not need to coincide with the ones used in the solution of the value iteration (4.2), but are rather only used to construct the grid. Moreover, in general we use and , i.e., the discretization used to construct the mesh is coarser than the one use to solve the control problem.
For a given pair of initial condition and control we solve numerically (2.1) to obtain trajectories
such that is an approximation of the state variable with initial condition , constant control , at time . For each pair we obtain the set containing the discrete trajectory, and our mesh is defined as
This choice of the grid is particularly well suited for the problem under consideration, as it does not aim at filling the space , but instead it provides points along trajectories of interest. In this view, the values of should be chosen so that contains points that are suitably close to the points of interest for the solution of the control problem. In the following proposition we provide a quantitative version of this idea, that will be the base of our error estimate in Theorem 4.5.
Let be the dynamics-dependent mesh of (4.6), and assume that is uniformly bounded i.e., there exist such that
Then for each , and it holds
Assume furthermore that is uniformly Lipschitz continuous in both variables, i.e., there exist such that
Then, if is a point on a discrete trajectory with initial point , control , timestep , and time instant , , it holds
If we simply have
which gives the bound (4.7) using only the boundedness of .
To prove (4.8) we need to work explicitly with the initial points and the control values. By assumption we have
and since , using the definition (4.6) we can choose it as
for some , . It follows that
and thus adding and subtracting gives
Now adding and subtracting in the sum, and using the Lipschitz continuity of , we get
Applying the discrete Grönwall lemma to this inequality gives
and since and are free, we can choose them as and . Finally, bounding by gives (4.8). ∎
Selection of the shape parameter
The quality of Shepard approximation strongly depends from the choice of shape parameter , both in general for RBF approximation  and in the special case of the solution of control problems . As mentioned in the introduction, several techniques exists to tune the shape parameter in the RBF literature, such as cross validation and maximum likelihood estimation (see e.g. Chapter 14 in ), but they are designed to optimize the value of in a fixed approximation setting. In our case, on the other hand, we need to construct an approximant at each iteration within the value iteration (4.2). This makes the existing methods computationally expensive and difficult to adapt to the target of minimizing the error in the iterative method.
For these reasons, we propose here a new method to select the shape parameter based on the minimization of a problem-specific indicator, namely the residual . Assuming that the value iteration with parameter has been stopped at iteration giving the solution , we define the residual as
and we choose the shape parameter that minimizes this quantity with respect to .
To get a suitable scale for the values of , we parametrize it in terms of the grid , similarly to what is done in  by setting for a given . Since is difficult to compute or even to estimate in high dimensional problems, we resort to setting and we optimize the value of . Observe that the separation distance is an easily computable quantity that depends only on , and it is thus actually feasible to use this parametrization even in high dimension.
Choosing an admissible set of parameters , the parameter is thus chosen by solving the optimization problem
This problem can be solved by using a comparison method or e.g. an inexact gradient method. The former means to discretize the set as and to compute all the value functions for all , The latter considers a projected gradient method where the parameter space is continuous and the derivative is computed numerically as
for some fixed . In the numerical tests, we will compare both minimization strategies in the low dimensional case, while we will concentrate on the comparison method in high dimensional one.
We adapt the classical convergence theory that is used to prove rates of convergence for the value iteration when linear interpolation is used. In particular, the following argumentation follows the discussion in [14, Section 8.4.1].
The idea is to estimate the time and space discretizations separately. Since the time discretization is independent of the interpolation scheme used in the space discretization, we just recall the following result from [14, Section 8.4.1]. Although the original result holds for the supremum norm over , we formulate it here for a general compact subset in order to deal with the space discretization later. We use the notation .
Let be the exact value function, and let be the solution of the value iteration (4.2) without space discretization, i.e.,
If is Lipschitz continuous, for each compact subset there exists a constant such that
Assume additionally that the following hold:
is uniformly bounded, is convex, is linear in .
is Lipschitz continuous and is convex.
There exists an optimal control .
Then there exists such that
It remains now to quantify the error that is committed by introducing a space discretization, i.e., the error associated to the interpolation scheme in (4.2). This first requires a bound on the error of Shepard interpolation, and we report in the next proposition a result obtained in . In this case, the key idea is to scale the shape parameter of the RBF basis according to the fill distance of the mesh. According to Proposition 4.1, we can control the fill distance of the mesh within the set obtained as the collection of the different trajectories of the discrete dynamics. In other words, we set
For this set, equation (4.8) gives
We can now recall from the error estimate for the Shepard approximation.
Let be the Lipschitz constant of . Let be the Shepard approximation of on obtained using a kernel with support contained in , and let for a positive constant . Then we have the bound
With these tools, we can finally prove the following theorem.
We consider the control that is optimal for , and we define for . This implies in particular that is sub-optimal for , and thus by the definition (4.2) for each it holds
Combining this inequality with the definition (4.11) of , we have
Now we add and subtract the interpolation of evaluated at , and use the bound (4.16) and the fact that is a contraction to obtain
We can now repeat the same reasoning using instead the control that is optimal for , and in this way we can obtain a bound on the opposite quantity . These two inequalities, when combined, give the desired bound:
The algorithm is summarized in Algorithm 1:
5 Numerical experiments
In this section we present three numerical tests to illustrate the proposed algorithm. The first test is a two dimensional minimum time problem with well-known analytical solution. In this test we analyse results using regular and scattered grids. The second and third test deal with a advection equation and a nonlinear heat equation, respectively. We discretize in space both PDEs using finite differences. The dimension of the semi-discrete problem will be for the advection equation and for the parabolic problem. For each example we provide examples of feedback and optimal controls reconstructed for different initial conditions that may not belong to the grid. In the parabolic case we present the effectiveness of feedback control under disturbances of the system.
In every experiment we define an admissible interval to solve the minimization problem by comparison in Algorithm 1 as follows: we start with a large interval and we coarsely discretize it. Then, we run Algorithm 1 and obtain . Later, we choose a set such that . Using and a finer refinement, the Algorithm provides . We iterate this procedure, and finally we set .
In our test we use the Wendland RBF defined in (3.1). The numerical simulations reported in this paper are performed on a laptop with one CPU Intel Core i7-2.2 GHz and 16GB RAM. The codes are written in MATLAB.
Test 1: Eikonal equation in 2D
We consider a two dimensional minimum time problem in with the following dynamics and control space
The cost functional to be minimized is with
being the time of arrival to the target for each .
The analytical solution for this problem is the Kruzkov transform of the distance to the target (see e.g. ):
with for each .
We tested Algorithm 1 for two different cases. The first one considers an unstructured grid generated by random points, whereas the second case studies an unstructured grid generated by the problem dynamics as mentioned in Section 4. Due to the randomness of our grid we compute an average error. In the second case we will average over tests whereas on the third one over . We do not discuss in detail the case with a regular grid since it has been extensively analyzed in . However, in Example 3, we show optimal trajectories using also the regular grid with linear interpolation and the Shepard’s approximant. The relative error reads:
where is the discrete value function obtained with the shape parameter and is the exact solution. We will denote by the parameter selected in order to minimize the relative error from (5.3):
Clearly, will be a lower bound for . In the following three cases we set and is discretized with equidistant controls.
Random Unstructured Grid
The first test with Eikonal equation is performed using an unstructured grid generated by random points. In order to obtain a grid which densely covers our numerical domain, a set of 40000 randomly distributed points in is clustered using the k-means algorithm, where is the number of desired points in the grid. Examples of this type of grid are shown in Figure 2.
We found a suitable parameter space discretized with as step size. In the HJB equation we set . In the left panel of Figure 3 we see an example of residual when the unstructured grid is formed by nodes. The residual is minimized between and . The average value after tests is as can be seen in the fourth column of Table 1. In the middle panel of Figure 3 we see a plot of for different number of points in the grid; the values are in the fifth column of Table 1. The right panel of Figure 3 shows the behavior of and decreasing the fill distance . The error decays as does.
Table 1 shows the quality of our results. The first column presents the fill distance and the relative number of points is shown in the second column. The third column presents the CPU time (in seconds). The fourth column presents the values , outputs of Algorithm 1 using the comparison method in the minimization procedure. The fifth column presents the values of . The sixth and seventh columns present the values of and .
We see the average decay of the fill distance when increasing the number of nodes. Accordingly, the average CPU time increases. The parameters and assume values close to each other, with . The errors and reduce according to .
Figure 4 presents in the left panel the exact solution evaluated on the scattered grid with points and . The middle picture is the solution obtained by value iteration algorithm with Shepard approximation and the last picture is the absolute error between the two solutions. The error has an erratic behavior always below .