# A tree structure algorithm for optimal control problems with state constraints

We present a tree structure algorithm for optimal control problems with state constraints. We prove a convergence result for a discrete time approximation of the value function based on a novel formulation of the constrained problem. Then the Dynamic Programming approach is developed by a discretization in time leading to a tree structure in space derived by the controlled dynamics, in this construction the state constraints are taken into account to cut several branches of the tree. Moreover, an additional pruning allows for the reduction of the tree complexity as for the case without state constraints. Since the method does not use an a priori space grid, no interpolation is needed for the reconstruction of the value function and the accuracy essentially relies on the time step h. These features permit a reduction in CPU time and in memory allocations. The synthesis of optimal feedback controls is based on the values on the tree and an interpolation on the values obtained on the tree will be necessary if a different discretization in the control space is adopted, e.g. to improve the accuracy of the method in the reconstruction of the optimal trajectories. Several examples show how this algorithm can be applied to problems in low dimension and compare it to a classical DP method on a grid.

## Authors

• 1 publication
• 5 publications
• 1 publication
• ### Equality Constrained Linear Optimal Control With Factor Graphs

This paper presents a novel factor graph-based approach to solve the dis...
11/02/2020 ∙ by Shuo Yang, et al. ∙ 0

• ### Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints

This paper presents a constrained deep adaptive dynamic programming (CDA...
11/26/2019 ∙ by Jingliang Duan, et al. ∙ 0

• ### Forward-Backward RRT: Branched Sampled FBSDEs for Stochastic Optimal Control

We propose a numerical method to solve forward-backward stochastic diffe...
06/22/2020 ∙ by Kelsey P. Hawkins, et al. ∙ 0

• ### A Space-Time Variational Method for Optimal Control Problems

We consider a space-time variational formulation of a PDE-constrained op...
10/01/2020 ∙ by Nina Beranek, et al. ∙ 0

• ### A Projection Approach to Equality Constrained Iterative Linear Quadratic Optimal Control

This paper presents a state and state-input constrained variant of the d...
05/23/2018 ∙ by Markus Giftthaler, et al. ∙ 0

• ### Sampling-based Polytopic Trees for Approximate Optimal Control of Piecewise Affine Systems

Piecewise affine (PWA) systems are widely used to model highly nonlinear...

• ### Real-Time Optimal Guidance and Control for Interplanetary Transfers Using Deep Networks

We consider the Earth-Venus mass-optimal interplanetary transfer of a lo...
02/20/2020 ∙ by Dario Izzo, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

We deal with the following optimal control problem with state constraints.

Let be an open bounded subset of , we consider the following system of controlled differential equations

 (1.1) {˙y(s)=f(y(s),u(s))s≥0y(0)=x

Here and the control belongs to the set of admissible control functions , typically the set of measurable control functions with values in , a compact subset of . We impose a state constraint on (1.1) requiring that the state remains in for all . As a consequence, we will consider admissible (with respect to the state constraint) only control functions guaranteeing that the corresponding trajectory never leaves . We will denote by this subset of , then for any

 (1.2) Ux={u(⋅)∈U:yx(t)∈¯¯¯¯Ω,∀t≥0}

where denotes the solution trajectory starting at .
Given a cost functional , the problem is to determine the value function

 (1.3) v(x)=infu∈UxJ(x,u),

and possibly an optimal control (at least approximate). We will use the notion of viscosity solution to the Hamilton-Jacobi-Bellman equation, introduced by Crandall and Lions in [15] (see also [22]), and in particular its extension to the notion of constrained viscosity solution given by Soner [28] in order to treat problems with state constraints. This definition combines the standard definition on with an appropriate inequality to be satisfied on (see also [13] for further developments of this notion). This condition can be applied to other hamiltonians coming from various optimal control problems.

In the first part of this paper we will consider for simplicity only convex constraints for the infinite horizon problem. As we will see later, similar arguments can be applied to other control problems and non convex constraints although our result does not cover this case. Dealing with the infinite horizon problem, Soner has shown that, whenever the value function is continuous, it is the unique constrained viscosity solution of the following Hamilton-Jacobi-Bellman equation

 (1.4) λv(x)=infu∈U{f(x,u)⋅∇v(x)+ℓ(x,u)}

where is a positive real parameter, the discount rate.

We should also mention that several results have been obtained for the existence of trajectories of (1.1) satisfying the state constraints (the so called viable trajectories) using the theory of multivalued differential inclusions (see Aubin–Cellina [6]). Essentially, we know that a viable solution exists if for any there exists at least one control such that the corresponding velocity belongs to the tangent cone to at (see Section 2 for a precise result in the convex case due to Haddad [20]). We recall that several extensions have been proposed for more general constraints using appropriate definitions of tangent cones (see [7] for an extensive presentation of this theory). These results gives necessary and sufficient conditions for the existence of viable trajectories so that one can determine the minimum set of assumptions guaranteeing that the optimal control problem can have a solution.

Several papers have been written on optimal control problems with state constraints starting from the seminal paper [28]. We can mention the interesting contributions by Ishii-Koike [21], Bokanowski-Zidani and co-authors [8, 4] and Motta [26] for different ways to deal with state constraints still having a well posed problem. We also mention the recent contribution by Kim, Tran and Tu [23] dealing with constrained problems on nested domains.
From the point of view of the numerical approximation a classical grid approach has been developed by Camilli-Falcone [10] and Bokanowski-Forcadel-Zidani [8].
In this respect they represent an extension to constrained problems of the numerical approximation developed by Capuzzo Dolcetta [11], Falcone [16] (see also the survey paper [12] and the book [18] for other numerical schemes related to optimal control problems via the Dynamic Programming approach). We end this short presentation mentioning that also viability tools have been applied to construct numerical methods for optimal control problems with state constraints, see e.g. [14].

Although convergence results are available for every dimension, numerical methods based on fixed space grids are difficult to apply for high-dimensional problems since they suffer for the well known ’curse of dimensionality’. This is why a renewed effort has been made in recent years to find other methods which can tackle high-dimensional optimal control problems. A list of references for other approaches dealing with high-dimensional problems is presented and discussed in

[1].

In the first part of this paper we propose a novel formulation of the time discrete infinite horizon problem that is close to the formulation presented in [21]

for the continuous problem and we prove a convergence result for the value function for a convex constraint. The proof is based on a mixture of tools coming from multivalued analysis and viscosity solutions. As we said, we want to develop a fast approximation scheme for the value function using the characterization in terms of the Hamilton-Jacobi-Bellman equation. To this end we will also use some tools of the viability theory to establish a precise convergence result (see Sections 2 and 3). The scheme is build having in mind a ”heuristic” representation of the value function which comes out coupling the viability results with standard dynamic programming arguments. Although we present our convergence result for the infinite horizon problem focusing on the treatment of boundary conditions for the stationary problem, similar arguments can be applied also to other optimal control problems such as the finite horizon and the optimal stopping problem (see Remark

3.1).
The second part of the paper is devoted to the construction of an efficient algorithm for a time discrete approximation of the value function that avoids the construction of a fixed grid in space and allows to apply the dynamic programming principle on a Tree Structure (TS), the main results on this approach have been presented [1, 2, 3]. Our contribution here is the extension of the TS Algorithm (TSA) to problems with state constraints and the feedback reconstruction using scattered data interpolation.

The outline of our paper is the following.

In Section 2 we introduce our basic assumptions and state some previous results about the characterization of the value function in terms of the Hamilton-Jacobi-Bellman equation. We present some results in the viability theory that are useful for the problem at hand and discuss a different way to write the equation. We continue introducing our time discretization and prove some properties of the discrete value function showing that the discretized equation (2.27) has a unique solution . We establish our main convergence result for the infinite horizon problem in Section 3 proving that converges to the value uniformly on the constraint , provided the state constraint is convex. In Section 4 we introduce the TSA for the finite horizon problem with state constraints and discuss some of its features. Finally, the last section is devoted to numerical experiments where we show the TSA is faster than the classical grid approximation. Moreover, some of the tests show that the method can also solve problems with non convex space constraints, overcoming the limits of our convergence result.

## 2. The infinite horizon problem with state constraints.

We will denote by the position at time of the solution trajectory of (1.1) corresponding to the control . Whenever this will be possible without ambiguity we adopt the simplified notations or instead of . The cost functional related to the infinite horizon problem is given by

 (2.1) J(x,u)≡∫+∞0ℓ(y(t),u(t))e−λtdt,

where is the running cost. As we said in the introduction we want to minimize with respect to the controls in so we need at least the assumption that

 (2.2) Ux≠∅ for any x∈¯¯¯¯Ω.

It is important to note that in general is not continuous on even when (2.2) is satisfied. This is due to the structure of the multivalued map .

Soner has shown in [28] that the value function is continuous (and then uniformly continuous) on if the following boundary condition on the vectorfield is satisfied

 (2.3) ∃γ>0:∀x∈∂Ω∃u∈Usuch that f(x,u)⋅η(x)≤−γ<0

where is the outward normal to at the point .

We will make the following assumptions:
A0. is a bounded, open convex subset of ;
A1. , compact;
A2. , is continuous and
A3. is continuous and .

Clearly, there exist two positive constants , such that

 (2.4) supu∈U|f(x,u)|≤Mf and supu∈U|ℓ(x,u)|≤Mℓ

for any . Notice that under the above assumptions the value function is bounded in by as can be easily checked.

Using the Dynamic Programming Principle, Soner has shown that is the unique viscosity solution of (1.4). This means that satisfies

 (2.5) H(x,u(x),∇u(x))≤0forx∈Ω
 (2.6) H(x,u(x),∇u(x))≥0forx∈¯¯¯¯Ω

where

 (2.7) H(x,u(x),∇u(x))≡λu(x)+maxu∈U{−f(x,a)⋅∇u(x)−ℓ(x,a)}

and the above inequalities should be understood in the viscosity sense (see [28] for the precise definition). A function satisfying (2.5) (respectively (2.6)) is be called a constrained viscosity subsolution (respectively supersolution) of .

###### Theorem 2.1.

Let (2.2), (A0) -(A3) be satisfied and let us assume that . Then, is the unique viscosity solution of (1.4) on .

###### Remark 2.1.

Necessary and sufficient conditions.
Condition (2.3) is known to be only a sufficient condition for the existence of trajectories living in . However, necessary and sufficient condition for the existence of solutions in have been extensively studied in viability theory (see [7]).
Let be an open convex subset of . A trajectory is called viable when

 (2.8) y(t)∈¯¯¯¯Ω,∀t≥0.

Let be a multivalued map which is lower semicontinuous and has compact convex images (we refer to [6] for the theory and the definitions related to multivalued maps). Let us define the tangent cone to a compact convex set at the point , as

 (2.9) TK(x)≡cl(⋃h>01h(K−x)).

 (2.10) F(x)∩T¯¯¯Ω≠∅,∀x∈¯¯¯¯Ω,

is necessary and sufficient to have viable trajectories in for the multivalued Cauchy problem

 (2.11) {˙y(t)∈F(y(t))t≥0,y(0)=x∈¯¯¯¯Ω.

This result has been also extended to more general sets (also non convex) introducing more general tangent cones (see [7] for a general presentation of the viability theory).

### 2.1. The time-discrete scheme for the constrained problem

In order to build a discretization of (1.4) we start using the standard discretization in time of (1.1), (2.1). We fix a positive parameter , the time step, and consider the following approximation scheme for (1.1) and (2.1)

 (2.12) {yn+1=yn+hf(yn,un),n∈Ny0=x
 (2.13) Jh(x,{un})=h+∞∑n=0f(yn,un)βk,

where , and .
For every the corresponding value function is

 (2.14) vh(x)=inf{un}∈UhxJh(x,{un}),

where

 (2.15) Uhx={{un}:un∈U and yn∈Ω,∀n∈N}

The above definition is meaningful only provided there exists a step such that . We look for conditions guaranteeing the existence of viable discrete trajectories. Let us introduce the multivalued map

 (2.16) Uh(x)≡{u∈U:x+hf(x,u)∈Ω}.

representing the subset of admissible (i.e. satisfying the constraint) controls for the discrete dynamics. Clearly if and only if for any . Due to the regularity assumptions on , is open and is bounded since is always contained in .

###### Remark 2.2.

Note that

 (2.17) if u∈Uh(x), then f(x,u)∈int(T¯¯¯Ω(x))

where is the interior of the tangent cone to at , i.e.

 (2.18) int(T¯¯¯Ω(x))=⋃h>01h(Ω−x).

In fact, if , then , which implies . Note that is not empty since .

The dependence of from is such that

 (2.19) Uh(x)⊂Ut(x)∀t∈(0,h],∀x∈¯¯¯¯Ω.

In fact, if then and (2.19) follows by the convexity of .

The following proposition gives necessary and sufficient conditions for the existence of a time step , such that for any and therefore guarantees .

###### Proposition 2.1.

Let be an open bounded convex subset of . Assume that is continuous. Then, there exists such that

 (2.20) Uh(x)≠∅ for any x∈¯¯¯¯Ω

if and only if the following assumption holds,

 (2.21) ∀x∈∂Ω,∃u∈U:f(x,u)∈int(T¯¯¯Ω(x)).
###### Proof.

If such an exists, (2.21) is satisfied by Remark 2.2.
Now let us consider an and let be a control satisfying (2.21). Since and is bounded there exists an such that

 (2.22) x+hx,uf(x,u)∈Ω.

Moreover, (2.22) is also valid for every positive by the convexity of . Since is bounded, (2.22) is satisfied for any and will not depend on . By the continuity of there will be an and a neighbourhood of such that

 ∀y∈I(x)∩¯¯¯¯Ω,y+hf(y,u)∈Ω

at least for .

We define

 Oh≡{x∈¯¯¯¯Ω|∃u∈U:x+hf(x,u)∈Ω for an h>0}.

Note that when all the directions are allowed provided is sufficiently small and the restrictions apply only for . The family is an open covering of from which we can extract a finite covering . We will have then for any setting . ∎

###### Corollary 2.1.

Under the same assumptions of Proposition 2.1 there exists such that

 (2.23) Ut(x)≠∅∀t∈(0,h],∀x∈¯¯¯¯Ω.

Let us remark that condition (2.21) is more general than the boundary condition (2.3) since does not require the regularity of . In fact for a closed convex subset , we can define the normal cone to at as

 (2.24) NK(x)≡{y∈Rd:⟨y,z⟩≤0,∀z∈TK(x)}

When the tangent cone will be the whole space and the normal cone will be empty. For these are real convex cones. Now assume that

has a regular boundary, the tangent cone is an hyperplane and the normal cone is reduced to

, . Then (2.3) implies that

 ∀x∈∂Ω ∃ u=u(x)∈U such that (2.25) ⟨f(x,u),v⟩=⟨f(x,u),λη(x)⟩≤−λγ<0∀v∈N¯¯¯Ω(x)

hence

 f(x,u)∈int(T¯¯¯Ω(x)).

In the sequel we will use condition (2.21) instead of (2.3).

The proof of the following result can be obtained by standard arguments so it will not be given here (see [9] for details).

###### Proposition 2.2.
 (2.26) vh(x)=inf{un}∈Uhx(hp−1∑k=0ℓ(yk,uk)βk+βpvh(yp)),

for any and , where is the trajectory with the the con sequence .

We will refer to (2.26) as the Discrete Dynamic Programming Principle (DDPP). For ,it gives the following discrete version of

 (2.27) v(x)=infu∈Uh(x){βv(x+hf(x,u))+hℓ(x,u)},x∈¯¯¯¯Ω.

Note that for the constrained problem the infimum is taken on the variable control set . In the next section we will see how to handle this dependency.

###### Theorem 2.2.

Let . Then, for any there exists a unique solution of

. Moreover, the following estimates hold true:

 (2.28) ωvh(δ)≤Lℓλ−Lfδ,δ>0
 (2.29) ∥vh∥∞≤Mℓλ.

where is the modulus of continuity of .

###### Proof.

The solution of (2.27) is the fixed point of the operator

 (2.30) Tv(x)=infu∈Uh(x){βv(x+hf(x,u))+hℓ(x,u)},x∈¯¯¯¯Ω.

Let and . By (3.16) for any , there exists such that

 (2.31) Tv(x)+ε≥βv(x+f(x,uε))+hℓ(x,uε),

then

 Tu(x)−Tv(x) ≤β[u(x+f(x,uε))−v(x+f(x,uε))]+ +h[ℓ(x,uε)−ℓ(x,uε)]+ε≤β∥u−v∥∞+ε,

which implies

 Tu(x)−Tv(x)≤β∥u−v∥∞;

Reversing the role of e we get

 (2.33) ∥Tu−Tv∥∞≤β∥u−v∥∞.

Note that if is such that , we have

 |Tv(x)|≤β∥v∥∞+hMℓ≤βM+hMℓ;

Then, recalling the definition of , implies

 (2.34) ∥Tv∥∞≤Mℓλ.

We can conclude that, for any , is a contraction mapping in so that there will be a unique bounded solution of .

Now we prove that . We show first that if then . Let , for any there exists which satisfies (2.31). Since is open and is continuous, there will be a neighbourhood of such that

 ∀y∈I(x)∩¯¯¯¯Ω,y+hf(y,uε)∈Ω,

then and we have

 (2.35) Tv(y)≤{βv(y+hf(y,uε))+hf(y,uε)}.

By (2.31) and (2.35) we get

 Tv(y)−Tv(x)≤β[v(y+hf(y,uε))−v(x+hf(x,uε))]+h[ℓ(y,uε)−ℓ(x,uε)]+ε ≤βωv((1+hLf)|x−y|)+hLℓ|x−y|+ε

where

 |y+hf(x,uε)−x−hf(x,uε)|≤(1+hLf)|x−y|

By the arbitrariness of , we conclude

 Tv(y)−Tv(x)≤βωv((1+hLf)|x−y|)+hLℓ|x−y|.

Since and are arbitrary, we can determine such that

 (2.36) |Tv(y)−Tv(x)|≤βωv((1+hLf)|x−y|)+hLℓ|x−y|

whenever . By (2.36) we get

 ωTv(δ)≤βωv((1+hLf)δ)+hLℓδ

and by the uniform continuity of

 limδ→0+ωTv(δ)=0,

then .

Since , the constant is strictly positive and one can easily check that

 ωTv0(δ)≤Chδ,

for any such that . Then the recursion sequence

 v1=Tv0,vn=Tvn−1n=2,3,…

starting at a such that and converges to the unique solution of (2.27). By (3.19) satisfies (3.15). Since is decreasing in , we get

 ωvh(δ)≤Chδ≤maxh>0hLℓ1−β(1+hLf)δ=Lℓλ−Lfδ,

and we can conclude the proof of the theorem. ∎

## 3. A convergence result

The main result of this section is that the solution of the discrete–time equation converges to . In order to prove this convergence we need some preliminary lemmas on the regularity of with respect to .

###### Proposition 3.1.

For any fixed , the multivalued map , , is lower semicontinuous in the sense of multivalued maps

###### Proof.

Let and . Recalling the definition of l.s.c. maps ( (see [6]), we have to show that there exists a neighborhood of such that

 (3.1) ∀y∈I(x)∃uy∈Uh(y)∩(ux+εB).

where is the unit ball in . Since is open and is continuous, we can determine and such that

 (3.2) ∀y∈(x+δ1B)∩¯¯¯¯Ω,∀u∈(ux+ε1B)∩U,y+hf(y,u)∈Ω,

then . Then we take and such that (3.2) holds and we get (3.1) setting . ∎

###### Theorem 3.1.

Let and consider the sequence of sets , . Let per , then

 (3.3) U⊂Lim–––––{Uhp(x)}for p→+∞.
###### Proof.

Let , we have to prove that , i.e. that for any , there exists an index such that

 (3.4) ∀p≥¯¯¯p,Uhp(x)∩(u+εB)≠∅.

Since and is bounded, there exists such that

 x+hx,uf(x,u)∈Ω.

By a compactness argument we can choose independently of . The continuity of then implies that there exists such that

 (3.5) ∀u′∈(u+δB),x+hxf(x,u′)∈Ω.

Moreover, there exists an index such that

 ∀p≥¯¯¯p,0

then by the convexity of also

 x+hpf(x,u′)∈Ω,

so . To end the proof it suffices to choose such that (3.5) holds. ∎

Using the above propositions, we can prove our main convergence result.

###### Theorem 3.2.

Let , then uniformly in , for .

###### Proof.

Since is uniformly bounded and equicontinuous, by the Ascoli–Arzelà theorem, there exist for and a function such that

 (3.6) vhp→vper p→+∞ uniformly on ¯Ω.

We will show that is the constrained viscosity solution of (1.4) in .
a) Let us prove first that is a subsolution of (1.4) in .
Let and let be a strict local maximum point for in , we have then

 (v−ϕ)(x0)>(v−ϕ)(x)∀x∈B(x0,r)⊂Ω

for sufficiently small. Then, for large enough, there exists such that has a local maximum point at and converges to . Note that for any control the point belongs to , and for large enough it belongs to . The above remarks imply

 (3.7) vhp(xhp0)−ϕ(xhp0)≥vhp(xhp0+hpf(xhp0,u))−ϕ(xhp0+hpf(xhp0,u)).

By (3.7) and (2.27) we get

 0=vhp(xhp0)+supu∈Uhp(xhp0){−(1−λhp)vhp(xhp0+hpf(xhp0,u))−hpℓ(xhp0,u)} ≥supu∈Uhp(xhp0){ϕ(xhp0)−ϕ(xhp0+hpf(xhp0,u))+λhpvhp(xhp0+hpf(xhp0,u))−hpℓ(xhp0,u)}

Since , it follows that there exists such that by the above inequality we get

 0 ≥supu∈Uhp(xhp0){−d∑i=1∂ϕ∂xi(xhp0+θhpf(xhp0,u))fi(xhp0,u)+ λvhp(xhp0+hpf(xhp0,u)−ℓ(xhp0,u)}.

Let be such that and . We can choose such that for any

 (3.9) ∅≠Uh¯p(x)⊂Uhp(x),∀x∈¯¯¯¯Ω, for p≥p′,

and by (3) we have

 (3.10) 0 ≥supu∈Uhp(xhp0){−d∑i=1∂ϕ∂xi(xhp0+θhpf(xhp0,u))fi(xhp0,u)+ (3.11) +λvhp(xhp0+hpf(xhp0,u)−ℓ(xhp0,u)}.for p≥p′

Let and , we define the real function ,

 (3.12)

where (note that is continuous in both variables). Let us define

 (3.13) W(x)≡supu∈Uh¯p(x)W(x,u).

By Proposition 3.1 is l.s.c. at , then by a standard result on multivalued map (see [6]) is l.s.c. at . Since

 (3.14) W(x0,u)→{−∇ϕ(x0)⋅f(x0,u)+λv(x0)−ℓ(x0,u)} for p→+∞,

and converges to , for any , there exists such that (4.11) and (4.12) hold true. By the lower semicontinuity of and the arbitrariness of we get

 (3.15) 0≥supu∈Uh¯p(x0){−∇ϕ(x0)⋅f(x0,u)+λv(x0)−ℓ(x0,u)}.

The inequality (3.15) is verified for any . We show that

 0≥supu∈Lim–––––{Uh¯p(x0)}G(u)for ¯¯¯p→+∞,

where

 G(u)≡−∇ϕ(x0)⋅f(x0,u)+λv(x0)−ℓ(x0,u).

It suffices to prove that

 0≥G(u) for any u∈Lim–––––{Uh¯p(x0)}.

In fact, for any , we can find a sequence , such that for and

 0≥G(uh¯p),

then passing to the limit for , by the continuity of we have

 0≥G(u).

Proposition 3.1 implies that

 U⊂Lim–––––{Uh¯p(x0)}for ¯¯¯p→+∞

so that

 0≥supu∈Lim–––––{Uh¯p(x0)}G(u)≥supu∈UG(u).

b) Now we prove that is a viscosity supersolution of (1.4) in .
Let and , be a strict maximum point for in . We can use the same arguments that we used for (4.9) in the first part of this theorem (just replace by ), so we get

 (3.16) 0 ≤supu∈Uhp(xhp0){−d∑i=1∂ϕ∂xi(xhp0+θhpf(xhp0,u))fi(xhp0,u)+ +λvhp(xhp0+hpf(xhp0,u))−ℓ(xhp0,u)}.

where .

By (3.16) for any there exists such that

 (3.17) 0 ≤supu∈Uhp(xh