We consider finite horizon feedback steering of an ensemble of trajectories subject to a controlled stochastic differential equation (SDE) with endpoint joint state probability density function (PDF) constraints – a topic of growing interest in the systems-control literature. Motivating applications include belief space motion planning for vehicular autonomy, and the steering of robotic or biological swarms via decentralized feedback. While early contributions focused on the covariance control [hotz1987covariance, skelton1989covariance, grigoriadis1997minimum], more recent papers [chen2015optimal1, chen2015optimal2, chen2016optimal] addressed the optimal feedback synthesis for steering an arbitrary prescribed initial joint state PDF to another prescribed terminal joint state PDF subject to controlled linear dynamics, and revealed the connections between the associated stochastic optimal control problem, the theory of optimal mass transport [villani2003topics], and the Schrödinger bridge [schrodinger1931umkehrung, schrodinger1932theorie]. Follow up works have accounted terminal cost [halder2016finite], input constraints [bakolas2018finite, okamoto2019input], output feedback [bakolas2017covariance], and some nonlinear dynamics [caluya2019finite, caluya2019finiteMulti, caluya2019wasserstein]. As for the state or path constraints, prior work [okamoto2018optimal] incorporated the same in soft probabilistic sense. The contribution of the present paper is to account hard deterministic path constraints in the problem of minimum effort finite horizon PDF steering via feedback synthesis. This can be intuitively phrased as the “hard safety with soft endpoint” problem.
The main idea underlying the ensuing development is to modify the unconstrained Itô SDEs to the “reflected Itô SDEs” [ikeda1961construction, watanabe1971stochastic, lions1984stochastic, harrison1987multidimensional], i.e., the controlled sample paths in the state space (in addition to the control-affine deterministic drift) are driven by two stochastic processes: a Wiener process, and a local time stochastic process. The latter enforces the sample paths in the state space to satisfy the deterministic non-strict111There is no loss of generality in allowing the sample paths to satisfy non-strict path containment in given since strict containment can be enforced by reflecting them from -inner boundary layer of for small enough. path containment constraints at all times. These considerations engender a Schrödinger bridge-like formulation–referred hereafter as the Reflected Schrödinger Bridge Problem (RSBP)–which unlike its classical counterpart, has extra boundary conditions involving the gradients of the so-called Schrödinger factors. We show how recent developments in contraction mapping w.r.t. the Hilbert metric, and the proximal recursion over the Schrödinger factors can be harnessed to solve the RSBP.
Ii Reflected Schrödinger Bridge Problem
Consider a connected, smooth222More precisely, there exists such that with boundary . and bounded domain . Let denote the closure of . For time , consider the stochastic control problem
where is the standard Wiener process in , the controlled state , and the endpoint joint state PDFs are prescribed333The notation means that the random vector has joint PDF . such that their supports are in
, both are everywhere nonnegative, have finite second moments, and. The parameter is referred to as the thermodynamic temperature, and the expectation operator in (1a) is w.r.t. the law of the controlled state . The set consists of all admissible feedback policies , given by
. We assume that the prior drift vector fieldis bounded Borel measurable in , and Lipschitz continuous w.r.t. . The vector field is set to be the inward unit normal to the boundary , and gives the direction of reflection. Furthermore, for , is minimal local time: a continuous, non-negative and non-decreasing stochastic process [glynn2018rate, harrison2013brownian, pilipenko2014introduction] that restricts to the domain , with . Specifically, letting denote the indicator function of the subscripted set, we have
which is to say that the process only increases at times when hits the boundary, i.e., when . Thus, (1b) is a controlled reflected SDE, and the tuple solves the Skorokhod problem [skorokhod1961stochastic, skorokhod1962stochastic, kruk2007explicit]. We point the readers to [lions1984stochastic] for the proof of existence and uniqueness of solutions to (1b) under the stated regularity assumptions.
To formalize the probabilistic setting of the problem at hand, let be the space of continuous functions . We view as a complete separable metric space endowed with the topology of uniform convergence on compact time intervals. With , we associate the -algebra . Consider the complete filtered probability space with filtration wherein “complete” means that contains all -null sets, and is right continuous. The processes , (for a given feedback policy ) and are -adapted (i.e., non-anticipating) for . In (1c), the random vectors and are respectively -measurable and -measurable.
Denote the Euclidean gradient operator as , the inner product as , and the Laplacian as . Letting
the law of the sample path of (1b) can be characterized [stroock1971diffusion] as follows: for each , there is a unique probability measure on such that (i) , (ii) for any whose inner normal derivative on is nonnegative,
is -submartingale, and (iii) there is a continuous, nonnegative, nondecreasing stochastic process satisfying (2). As a consequence [stroock1971diffusion, p. 196] of this characterization it follows that the process is Feller continuous and strongly Markov. In particular, the measure-valued trajectory comprises of absolutely continuous measures w.r.t. Lebesgue measure.
The objective in problem (1) is to perform the minimum control effort steering of the given initial state PDF at to the given terminal state PDF at subject to the controlled sample path dynamics (1b). In other words, the data of the problem consists of the domain , the prior dynamics data , and the two endpoint PDFs .
Formally, we can transcribe (1) into the following variational problem [benamou2000computational]:
where a PDF-valued curve if for each , the PDF is supported on , and has finite second moment. In this paper, we will not focus on the rather technical direction of establishing the existence of minimizer for (3), which can be pursued along the lines of [villani2003topics, p. 243–245]. Instead, we will formally derive the conditions of optimality, convert them to the so-called Schrödinger system, and argue the existence-uniqueness of solutions for the same.
Ii-B Necessary Conditions of Optimality
The following result summarizes how the optimal pair for problem (3) can be obtained.
Theorem 1 (Optimal control and optimal state PDF)
A pair solving the variational problem (3) must satisfy the system of coupled nonlinear PDEs:
subject to the boundary conditions
The PDE (4a) is a controlled Fokker-Planck-Kolmogorov (FPK) equation, and (4b) is a Hamilton-Jacobi-Bellman (HJB) equation. Because the equations (4a)-(4b) have one way coupling, and the boundary conditions (6a)-(6c) are atypical, solving (4) is a challenging task in general. In the following, we show that it is possible to transform the coupled nonlinear system (4) into a boundary coupled linear system of PDEs which we refer to as the Schrödinger system. We will see that the resulting system paves way to a computational pipeline for solving the density steering problem with path constraints.
Ii-C Schrödinger System
We now apply the Hopf-Cole transform [cole1951quasi, hopf1950partial] to the system of nonlinear PDEs (4).
Theorem 2 (Schrödinger system)
From (7), both are nonnegative by definition, and strictly positive if is bounded and is positive.
Under the regularity assumptions on and stated in Section II-A, the process satisfying the uncontrolled reflected Itô SDE
is a Feller continuous strongly Markov process. Therefore, the theory of semigroups applies and the transition density of (11) satisfies Kolmogorov’s equations. Notice that the transition density or Green’s function will depend on the domain . In particular, we point out that (8a) is the backward Kolmogorov equation in unkonwn with the corresponding Neumann boundary condition in (9b). On the other hand, (8b) is the forward Kolmogorov equation in unkonwn with the corresponding Robin boundary condition in (9b). These “backward Kolmogorov with Neumann” and “forward Kolmogorov with Robin” system of PDE boundary value problems are coupled via the atypical boundary conditions (9a).
Theorem 2 reduces finding the optimal pair for the RSBP to that of finding the pair444We refer to as the Schrödinger factors. associated with the uncontrolled SDE (11). To do so, we need to compute the terminal-initial condition pair , which can be obtained by first making an initial guess for and then performing time update by integrating the system (8)-(9b). Using (9a), this then sets up a fixed point recursion over the pair (see Fig. 1). If this recursion converges to a unique pair, then the converged pair can be used to compute the transient factors , and we can recover via (10). This computational pipeline will be pursued in this paper.
Since the PDEs in (8) are linear, and the boundary couplings in (9a) are in product form, the nonnegative function pair can only be unique in the projective sense, i.e., if is a solution then so is for any . In [chen2016entropic], it was shown that the aforesaid fixed point recursion is in fact contractive on a suitable cone in Hilbert’s projective metric, and hence guaranteed to converge to a unique pair , provided that the transition density for (11) is positive and continuous555Under the regularity assumptions on and stated in Section II-A, the transition density for (11) indeed satisfies these conditions. on for all , and are supported on compact subsets of .
Iii Case Study: RSBP in 1D without Prior Drift
To illustrate the ideas presented thus far, we now consider a simple instance of problem (3) over the state space , and with the prior drift . That is to say, we consider the finite horizon density steering subject to the controlled two-sided reflected Brownian motion. Using some properties of the associated Markov kernel, we will show that the Schrödinger system (8)-(9) corresponding to this particular RSBP has a unique solution which can be obtained by the kind of fixed point recursion mentioned toward the end of Section II-C.
is the Markov kernel or transition density [linetsky2005transition, Sec. 4.1],[bhattacharya2009stochastic, p. 410-411] associated with the uncontrolled reflected SDE
Clearly, solving (16) is equivalent to solving (12). The pair can be solved from (16) iteratively as a fixed point recursion with guaranteed convergence established through contraction mapping in Hilbert’s projective metric; see [chen2016entropic]. The Lemma 1 stated next will be used in the Proposition 1 that follows, showing the existence and uniqueness of the pair in (16) as well as the fact that the aforesaid fixed point recursion is guaranteed to converge to that pair.
For , , consider the transition probability density in (III). Then,
is continuous on the set .
for all .
where the supports of (17) are restricted to , and the proportionality constants are determined accordingly. For state feedback synthesis enabling this unimodal to bimodal steering over , we performed the fixed point recursion over the pair using (16) with as in (17), and given by (III). For numerical implementation, we truncated the infinite sum in (III) after the first 100 terms. Fig. 4 shows the convergence of this fixed point recursion w.r.t. Hilbert’s projective metric. The converged pair is used to compute the transient Schrödinger factors via (13), and then the pair via (10). Fig. 5 depicts the evolution of the optimal controlled transient joint state PDFs as well as 100 sample paths of the optimal closed-loop reflected SDE. These sample paths were computed by applying the Euler-Maruyama scheme with time-step size . Notice from Fig. 5 that (i) the closed-loop sample paths satisfy for all , and (ii) in the absence of feedback, the terminal constraint (given by (17b)) cannot be satisfied.
Iv RSBP with Prior Drift
For generic , , there is no closed-form expression of the Markov Kernel associated with (8)-(9b). Hence, unlike the situation in Section III, we cannot explicitly set up coupled integral equations of the form (16), thus preventing the numerical implementation of the fixed point recursion (Fig. 1) via direct matrix-vector recursion. In this Section, we will show that if is gradient of a potential, then we can reformulate (8)-(9) in a way that leads to a variational recursion which in turn enables us to implement the fixed point recursion (Fig. 1) in an implicit manner.
Iv-a Reformulation of the Schrödinger System
The idea now is to exploit the structural nonlinearities in (18) to design an algorithm that allows computing the Schrödinger factors . To that end, the following is a crucial step.
Given , and , consider in (18). Let , and define the mappings given by , . Then solves the PDE initial boundary value problem:
From (20a)-(20b), and satisfy the exact same FPK PDE with different initial conditions and integrated in different time coordinates and . From (20d), and satisfy the same Robin boundary condition. Therefore, a single FPK initial boundary value problem solver can be used to set up the fixed point recursion to solve for , and hence . From , we can recover as
Iv-B Computation via Wasserstein Proximal Recursion
Building on our previous works [caluya2019proximal, caluya2019gradient, caluya2019wasserstein], we propose proximal recursions to numerically time march the solutions of the PDE initial boundary value problems (20) by exploiting certain infinite dimensional gradient descent structure. This enables us to perform the computation associated with the horizontal arrows in Fig. 1, and hence the fixed point recursions to solve for , and consequently for . We give here a brief outline of the ideas behind these proximal recursions.
w.r.t. the distance metric referred to as the (quadratic) Wassertein metric [villani2003topics] on . For chosen time-steps , this allows us to set up a variational recursion over the discrete time pair as
wherein the Wasserstein proximal operator
In the numerical example provided next, we solved (22) using the algorithm developed in [caluya2019gradient].
Iv-C Numerical Example
We consider an instance of the RSBP with , , . For
the optimal controlled joint state PDFs are shown in Fig. 6. The corresponding uncontrolled joint state PDFs are shown in Fig. 7. These results were obtained by solving (22) via [caluya2019gradient, Sec. III.B] with to perform the fixed point recursion (Fig. 1) applied to (20).
In this paper, we introduced the Reflected Schrödinger Bridge Problem (RSBP) – a stochastic optimal control problem for minimum energy feedback steering of a given joint PDF to another over finite horizon subject to reflecting boundary conditions on the controlled state trajectories. Combining our prior work on Wasserstein proximal recursions with some recent results on contraction mapping associated with the Schrödinger system, we provide a computational pipeline for optimal feedback synthesis. Numerical examples are given to highlight the proposed framework.
-a Proof of Theorem 1
The necessary conditions for optimality (4) can be deduced using the Lagrange multiplier theorem in Banach spaces; see [zeidler1995applied, Ch.4.14, Proposition 1]. This theorem allows us set up an augmented Lagrangian associated with (3) and perform pointwise minimization to derive (4).
To apply this in our context, define the function spaces
where denotes the time interval, and denotes the space of square integrable functions. The notation stands for the Sobolev space of functions having first order weak derivatives w.r.t. , and finite norms w.r.t. . Furthermore, , wherein denotes the dual space of the Sobolev space . We denote the dual space of as . In the definition of , the notation stands for the space of all linear functionals on . Then, in (3a), the objective functional , and is given by
The constraint is a mapping given by
where we used (3c) so that the boundary terms vanish in the integration by parts. Following [albi2017mean, p. 112-114], one can show that and (where denotes derivative w.r.t. the subscripted variable) are surjective, and hence by [zeidler1995applied, Ch.4.14, Proposition 1], there exists . This result allows us to perform pointwise minimization of the augmented Lagrangian
By performing integration by parts in , term becomes