Belief Space Planning: A Covariance Steering Approach

A new belief space planning algorithm, called covariance steering Belief RoadMap (CS-BRM), is introduced, which is a multi-query algorithm for motion planning of dynamical systems under simultaneous motion and observation uncertainties. CS-BRM extends the probabilistic roadmap (PRM) approach to belief spaces and is based on the recently developed theory of covariance steering (CS) that enables guaranteed satisfaction of terminal belief constraints in finite-time. The nodes in the CS-BRM are sampled in belief space and represent distributions of the system states. A covariance steering controller steers the system from one BRM node to another, thus acting as an edge controller of the corresponding belief graph that ensures belief constraint satisfaction. After the edge controller is computed, a specific edge cost is assigned to that edge. The CS-BRM algorithm allows the sampling of non-stationary belief nodes, and thus is able to explore the velocity space and find efficient motion plans. The performance of CS-BRM is evaluated and compared to a previous belief space planning method, demonstrating the benefits of the proposed approach.



There are no comments yet.


page 7


Batch Belief Trees for Motion Planning Under Uncertainty

In this work, we develop the Batch Belief Trees (BBT) algorithm for moti...

Approximate Planning for Factored POMDPs using Belief State Simplification

We are interested in the problem of planning for factored POMDPs. Buildi...

Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Motion planning under uncertainty is of significant importance for safet...

Learning Configuration Space Belief Model from Collision Checks for Motion Planning

For motion planning in high dimensional configuration spaces, a signific...

Estimating Configuration Space Belief from Collision Checks for Motion Planning

For motion planning in high dimensional configuration spaces, a signific...

Motion Planning under Partial Observability using Game-Based Abstraction

We study motion planning problems where agents move inside environments ...

General Purpose Incremental Covariance Update and Efficient Belief Space Planning via Factor-Graph Propagation Action Tree

Fast covariance calculation is required both for SLAM (e.g. in order to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Motion uncertainty and measurement noise arise in all real-world robotic applications. When evaluating the safety of a robot under motion and estimation uncertainties, it is no longer sufficient to rely only on deterministic indicators of performance, such as whether the robot is in collision-free or in-collision status. Instead, the state of the robot is best characterized by a probability distribution function (pdf) over all possible states, which is commonly referred to as the

belief or information state [8, 30, 32]

. Explicitly taking into account the motion and observation uncertainties thus requires planning in the belief space, which allows one to compute the collision probability and thus make more informed decisions. Planning under motion and observation uncertainties is referred to as belief space planning, which can be formulated as a partially observable Markov decision process (POMDP) problem

[16]. Solving POMDP in general domains, however, is very challenging, despite some recent progress in terms of more efficient POMDP solvers [20, 26, 27, 28, 29]. Planning in infinite-dimensional distributional (e.g., belief) spaces can become more tractable by the use of roadmaps, that is, graphs constructed by sampling. Since their introduction [4, 23], such belief roadmaps (BRMs) have increased in popularity owing to their simplicity and their ability to avoid local minima.

Sampling-based motion planning algorithms such as probabilistic roadmaps (PRM) [18] and rapidly exploring random trees (RRTs) [17] can be used to solve planning problems in high-dimensional continuous state spaces, by building a roadmap or a tree incrementally through sampling the search space. However, traditional PRM-based methods only address deterministic systems. PRM methods have been extended to belief space planning using belief space roadmaps (BRMs) [23, 3, 31]. One of the main challenges of belief space roadmap (BRM) methods is that the costs of different edges depend on each other, resulting in the “curse of history” problem for POMDPs [22, 3]

. This dependence between edges breaks the optimal substructure property, which is required for search algorithms such as Dijkstra’s algorithm or A*. This problem arises from the unreachability of the belief nodes; even if the robot has full control of its mean, it is difficult to reach higher order moments (e.g., a specified covariance). Since the nodes in BRM are sampled in the belief space, the edges in BRM should ideally steer the robot from one distribution to another. If reachability of BRM nodes is not achieved, an edge in the BRM depends on all preceding edges along the path.

In general, in the literature, the belief

may be classified into

(i) the estimation belief (e-belief), describing the output of the estimator, i.e., the pdf over the error between the estimated value and true value of the state; (ii) control belief (c-belief), which refers to the pdf over “separated control” error, i.e., the error between the estimated state and the desired state; and (iii) the full belief (f-belief), that is, the belief over the true state. We discuss these different beliefs in greater detail in Section III. We just note here that e-belief planning tries to obtain better state estimates and finds a path with minimum estimation uncertainty. The f-belief planning aims at minimizing the full uncertainty that takes into account the impact of the controller as well. In this paper, we consider the f-belief planning problem and will call it belief planning for simplicity.

Planning in e-belief space was studied in [23, 9], where the goal was to find the minimum estimation uncertainty path for a robot from a starting position to a goal position. In [31], a linear-quadratic Gaussian (LQG) controller was used for motion planning within the BRM context. By taking into account the controller and the sensors used, [31] computes the true a priori probability distribution of the state of the robot. Thus, [31] studies the f-belief planning problem. However, the independence between edges is not satisfied. Similarly, [10] studies the f-belief planning problem using a tree.

The state-of-the-art in terms of BRM methods is probably [3], which tackles the “curse of history” problem. The proposed SLQG-FIRM method achieves node reachability using a stationary LQG controller. One limitation of this method is that the nodes have to be stationary. That is, the nodes in the BRM graph need to be sampled in the equilibrium space of the robot, which usually means zero velocity. Thus, this method cannot explore the velocity space and the resulting paths are suboptimal. Secondly, a converging process is required at every node. That is, the robot will have to “wait” at each node, which will increase the time required for the robot to reach the goal. Some remedies are introduced in [2] for systems (e.g., a fixed-wing aircraft) that cannot reach zero velocity by using periodic trajectories and periodic controllers. The periodic controller is applied repeatedly until the trajectory of the vehicle converges to the periodic trajectory. Thus, a “waiting” procedure is still required in these approaches. Online replanning in belief space is studied in [1]. The method in [1] improves the online phase by recomputing the local plans, which includes adding a virtual belief node and local belief edges to the FIRM graph, and solving a dynamic program at every time-step. The offline roadmap construction phase is the same as FIRM [3].

Recent developments in explicitly controlling the covariance of a linear system [13, 14]

provide an appealing approach to construct the BRM with guarantees of node reachability. In particular, for a discrete-time linear stochastic system, covariance steering theory designs a controller that steers the system from an initial Gaussian distribution to a terminal Gaussian distribution in finite-time

[5, 15, 19]. Reference [13] formulated the finite-horizon covariance control problem as a stochastic optimal control problem. In [5], the covariance steering problem is formulated as a convex program. Additional state chance constraints are considered in [19] and nonlinear systems are considered in [24] using iterative linearization. The covariance steering problem with output feedback has also been studied in [25, 12, 6].

In this paper, we propose the CS-BRM algorithm, which uses covariance steering as the edge controller of a BRM to ensure a priori node reachability. Since the goal of covariance steering is to reach a given distribution of the state, it is well-suited for reaching a belief node. In addition, covariance steering avoids the limitation of sampling in the equilibrium space, and thus the proposed CS-BRM method allows sampling of non-stationary belief nodes. Our method allows searching in the velocity space and thus finds paths with lower cost, as demonstrated in the numerical examples in Section VI.

Extending belief planning to nonlinear systems requires a nominal trajectory for each edge [3]. In the SLQG-FIRM framework, these nominal trajectories are either assumed to be given or being approximated by simple straight lines [31, 3]. However, the nominal trajectory has to be dynamically feasible in order to apply a time-varying LQG controller. Also, the nominal trajectory must be optimal if one wishes to generate optimal motion plans. Finding an optimal nominal trajectory requires solving a two-point boundary value problem (TPBVP), which for general nonlinear systems can be computationally expensive [7]. In this paper, we develop a simple, yet efficient, algorithm to find suitable nominal trajectories between the nodes of the BRM graph to steer the mean states in an optimal fashion.

The contributions of the paper are summarized as follows.

  • A new belief space planning method, called CS-BRM, is developed, to construct a roadmap in belief space by using the recently developed theory of finite-time covariance control. CS-BRM achieves finite-time belief node reachability using covariance steering and overcomes the limitation of sampling stationary nodes.

  • The concept of compatible nominal trajectory is introduced, which aims to improve the performance of linearization-based control methods to control nonlinear systems. An efficient algorithm, called the CNT algorithm, is proposed to compute nominal feasible trajectories for nonlinear systems.

The paper is organized as follows. The statement of the problem is given in Section II. In Section III, the output-feedback covariance steering theory for linear systems is outlined. In Section IV, the method of computing nominal trajectory for nonlinear systems is introduced. The main algorithm, CS-BRM, is given in Section V. The numerical implementation of the proposed algorithm is presented in Section VI. Finally, Section VII concludes the paper.

Ii Problem Statement

We consider the problem of planning for a nonholonomic robot in an uncertain environment which contains obstacles. The uncertainty in the problem stems from model uncertainty, as well from sensor noise that corrupts the measurements. We model such a system by a stochastic difference equation of the form


where are the discrete time-steps, is the state, and is the control input. The steps of the noise process

are i.i.d standard Gaussian random vectors. The measurements are given by the noisy and partial sensing model


where is the measurement at time step , and the steps of the process are i.i.d standard Gaussian random vectors. We assume that and are independent.

The objective is to steer the system (1) from some initial state to some final state within time steps while avoiding obstacles and, at the same time, minimize a given performance index. The controller at time step is allowed to depend on the whole history of measurements up to time but not on any future measurements.

This is a difficult problem to solve in its full generality. Here we use graph-based methods to build a roadmap in the space of distributions of the states (e.g., the belief space) for multi-query motion planning. Planning in the belief space accounts for the uncertainty inherent in system (1) and (2) owing to the noises and , as well as uncertainty owing to the distribution of the initial states. Each node sampled in the belief space is a distribution over the state. The edges in a BRM steer the state from one distribution to another. In the next section, we describe a methodology to design BRM edge controllers that allow to steer from one node (i.e., distribution) to another. Similar to previous works [23, 3, 31], we consider Gaussian distributions where the belief is given by the state mean and the state covariance.

Iii Covariance Steering

In this section, we give a introduction of covariance steering and outline some key results. Some of these results are explicitly used in subsequent sections. The derivation in this section follows closely [25, 19].

Given a nominal trajectory , where , we can construct a linear approximation of (1)-(2) around via linearization as follows


where is the drift term, , , and are system matrices, and and are observation model matrices.

We define the covariance steering problem as follows.

Problem 1: Find the control sequence such that the system given by (3) and (4), starting from the initial state distribution , reaches the final distribution where , while minimizing the cost functional


where is a given reference trajectory of the states, is the mean of the state , and are the covariance matrices of and , respectively, the terminal covariance is upper bounded by a given covariance matrix , and the matrices and are given.

Iii-a Separation of Observation and Control

Similarly to [25], we assume that the control input at time-step is an affine function of the measurement data. It follows that the state will be Gaussian distributed over the entire horizon of the problem. We assume the system (3)-(4

) is observable. To solve Problem 1, we use a Kalman filter to estimate the state. Specifically, let the prior initial state estimate be

and distributed according to , and let the prior initial estimation error be and let its distribution be given by . The estimated state at time , denoted as , with denoting the filtration generated by , is computed from [11]




where is the Kalman gain and where the covariances of , and are denoted as , and , respectively.

It can be shown that the cost functional (5) can be written as


where the last summation is deterministic and does not depend on the control and thus it can be discarded.

By defining the innovation process as


and noting that , the estimated state dynamics in equation (6) can be rewritten as


with . We can then restate Problem 1 as follows.

Problem 2: Find the control sequence , such that the system (10) starting from the initial distribution reaches the final distribution , where , while minimizing the cost functional


To summarize, the covariance steering problem of the state with output feedback has been transformed to a covariance steering problem of the estimated state , where is computed using the Kalman filter. This transformation relies on the separation principle of estimation and control. The solution to this problem is given in the next section.

Iii-B Separation of Mean Control and Covariance Control

In this section, we will separate Problem 2 into a mean control problem and a covariance control problem.

By defining the augmented vectors and We can compute as follows


where , , , and , , , and are block matrices constructed using the system matrices in (10). Defining , , , , and , and using (12), it follows that


The cost functional in (11) can be rewritten as


where , , .

From (13), (14), and (15), the covariance steering problem of the estimated state (Problem 2) can thus be divided into the following mean control and covariance control problems. The mean control problem is given by


where is a matrix defined such that . The covariance control problem is given by


where , .

Iii-C Solutions of the Mean Control and Covariance Control Problems

Assuming that the system (3) is controllable, the solution for the mean control problem (16) to obtain the mean trajectory and is given by [19]


where , and .

The solution to the covariance control problem (17) is given by the controller of the form


equivalently, by , where the control gain matrix is lower block diagonal of the form


and is computed from where is obtained by solving the following convex optimization problem [25]


subject to


where is the covariance of and is the covariance of the innovation process.

Iv Computing the nominal Trajectory

In this section, we develop an efficient algorithm to find nominal trajectories for nonlinear systems using the mean controller (18). A deterministic nonlinear dynamic model, i.e., with the noise set to zero, is considered when designing the nominal trajectory. To this end, consider the deterministic nonlinear system


Let and be a nominal trajectory for this system. The linearized model (3) along this nominal trajectory will be used by the mean controller (18) to compute a control sequence and the corresponding state sequence .

Definition 1: A nominal trajectory is called a compatible nominal trajectory for system (23) if and for .

Linearizing along a compatible nominal trajectory ensures that the nonlinear system is linearized at the “correct” points. When applying the designed control to the linearized system, the resulting trajectory is the same as the nominal trajectory, which means that the system reaches the exact points where the linearization is performed. On the other hand, there will be extra errors caused by the linearization if the system is linearized along a non-compatible nominal trajectory.

The iterative algorithm to find a compatible nominal trajectory is given in Algorithm 1.

1 Initialize and ;
2 while NotConverged do
3       Linearize (23) along and ;
4       Compute the mean optimal control using (18) to obtain the control sequence ;
5       Compute the the controlled state trajectory using (13) ;
6       Set and ;
return and ;
Algorithm 1 Compatible Nominal Trajectory (CNT)

When the algorithm converges, the nominal trajectory is the same as the mean optimal trajectory, which, by definition, is a compatible nominal trajectory. Note that gives the analytical solution of the mean control, which can be quickly computed. Each iteration of the algorithm has a low computational load and the overall algorithm may be solved efficiently. As seen later in Section VI, the algorithm typically convergences within a few iterations.

Fig. 1: Construction of the CS-BRM. Gray shapes denote obstacles. (a) Node sampling. Each node is represented by its mean, a state covariance, and a state estimation error covariance, all of which are sampled in proper sample spaces; (b) Mean trajectories are computed for all neighboring node pairs using the mean controller. Only mean trajectories that are collision-free are preserved. Those mean trajectories indicate possible connections/edges in the CS-BRM; (c) Kalman filter updates are simulated for each possible edge; (d) Covariance control is applied for each edge to execute the transition between the nodes.

V The CS-BRM Algorithm

The nodes in the CS-BRM are sampled in the belief space. In a partially observable environment, the belief at time-step

is given by the conditional probability distribution of the state

, conditioned on the history of observations and the history of control inputs , that is, . In a Gaussian belief space, can be equivalently represented by the estimated state, , and the estimation error covariance, , that is, [10, 3]. The state estimate is Guassian, and is given by . Hence, the Gaussian belief can be also written as . The main idea of CS-BRM is to use covariance steering theory to design the edge controller to achieve node reachability.

1 ;
2 , ;
3 for  do
4       ;
5       foreach  do
6             ;
7             if  then
8                   ;
9                   if  then
10                         ;
11                         ;
12                         ;
15             ;
16             if  then
17                   Repeat line 8-13 with and swapped
19       ;
21 ;
return CS-BRM;
Algorithm 2 Constructing CS-BRM

An illustration of the steps for building the CS-BRM is shown in Fig. 1. The algorithm for constructing the CS-BRM is given in Algorithm 2. The following procedures are used in the algorithm.
Sample Nodes: The function SampleNodes samples CS-BRM nodes. A node in the CS-BRM is represented by the tuple , where is the state mean, is the prior estimated state covariance, and is the prior state estimation error covariance. Since , the node can also be equivalently represented by . Note that we can compute the a posteriori convariances and using and . For constructing node , the algorithm starts by sampling the free state space, which provides the mean of the distribution. Then, an estimated state covariance is sampled from the space of symmetric and positive definite matrix space. Similarly, a state estimation error covariance is also sampled at node . is the initial condition of the Kalman filter update for the edges that are coming out of node . In Fig. 1(a), for each node, is shown as a black dot, is shown as a solid ellipse, and is shown as a dashed ellipse.
Neighbor: The function Neighbor finds all the nodes in that are within a given distance to node , where is a node set containing all nodes in the CS-BRM.
Mean Trajectory: Given two nodes and , MTraj uses the mean controller (18) and Algorithm 1 to find the compatible nominal trajectory, which is also the mean trajectory from to . The function returns the mean control , mean trajectory , and the cost of the mean control .
Obstacle Checking: The function ObstacleFree returns true if the mean trajectory is collision free.
Kalman Filter: Given two nodes and , KF returns the prior estimation error covariance at the last time step of the trajectory from to .
Covariance Control: CovControl solves the covariance control problem from to . It returns the control and the cost .
Monte Carlo: We use Monte Carlo simulations to calculate the probability of collision of the edges. For edge , the initial state and initial state estimate are sampled from their corresponding distributions. The state trajectory is simulated using the mean control and covariance control . Then, collision checking is performed on the simulated state trajectory. By repeating this process, we approximate the probability of collision of this edge. In our implementation the collision cost is taken to be proportional to the probability of collision along that edge.

Algorithm 2 starts by sampling nodes in the belief space using SampleNodes (Line 1). Lines 3-17 are the steps to add CS-BRM edges. Given two neighboring nodes and , Lines 6-13 try to construct the edge and Lines 14-16 try to construct the edge .

Every edge in the CS-BRM is constructed by solving a covariance steering problem. Each node tries to connect to its neighboring nodes if possible. First, for each edge, we use Algorithm 1 to find the mean trajectory of that edge (Line 6). Next, we check if the mean trajectory is collision-free. Only if the mean trajectory is collision-free do we proceed to the next step of edge construction. Then, the Kalman filter updates are simulated, which give the state estimation error covariance at every time step along that edge. The state estimation error covariances of the edges and are shown as red dash ellipses in Figure 1(c). Take the edge as an example. The initial condition of the Kalman filter is . The prior estimation error covariance at the final time-step, , is compared with the state estimation error covariance of node , . If (Line 9), the algorithm proceeds to solve the covariance control problem and this edge is added to the graph. For the covariance control problem of edge , the initial constraint and the terminal constraint are given by and . The covariance is computed using and . The final edge controller is the combination of the mean controller, covariance controller, and the Kalman filter.

In addition to the edge controller, the edge cost is computed for each edge. The edge cost is a weighted sum of , , and . With covariance steering serving as the lower-level controller, the higher-level motion planning problem using the roadmap is a graph search problem similarly to a PRM. Thus, the covariance steering approach transforms the belief space roadmap into traditional PRM with specific edge costs.

In CS-BRM, each edge is an independent covariance steering problem and the planned path using CS-BRM consists of a concatenation of edges. Since the terminal estimated state covariance satisfies an inequality constraint (17) and the terminal state estimation error covariance satisfies an inequality relation by construction (Line 9), it is important to verify that the covariance constraints at all nodes are still satisfied by concatenating the edges.

To this end, consider the covariance steering problem from node to node . For the edge , we have the initial constraints , , and the terminal covariance constraint . Note that the constraint is satisfied for (Line 9, Algorithm 2). Suppose that the solution of the covariance control problem of edge is the feedback gain . By concatenating the edges, the system may not start exactly at node . Instead, it will start at some node that satisfies and . Next, we show that by applying the pre-computed feedback gain , the terminal covariance still satisfies for the new covariances and . Before verifying this result, we provide a property of the covariance of the Kalman filter. Given an initial estimation error covariance , we obtain a sequence of error covariances using the Kalman filter updates (7). Similarly, given the new initial error covariance which satisfies , we obtain a new sequence of error covariances . We have the following lemma.

Lemma 1

If , then , for all .

The proof of this lemma is omitted. A proof of a similar result can be found in [10]. From Lemma 1, it is straightforward to show that, if , we also have for all .

Proposition 2

Consider a path on the CS-BRM roadmap, where the initial node of the path is denoted by and the final node of the path is denoted by . Starting from the initial node and following this path by applying the sequence of edge controllers, the robot will arrive at a belief node such that , , and .

The proof of this proposition is straightforward and is omitted. Proposition 2 guarantees that, when planning on the CS-BRM roadmap, the covariance at each arrived node is always smaller than the assigned fixed covariance at the corresponding node of the CS-BRM roadmap.

Vi Numerical Examples

In this section, we first illustrate our theoretical results for the motion planning problem of a 2-D double integrator. We compare our method with the SLQG-FIRM algorithm [3], and we show that our method overcomes some of the limitations of SLQG-FIRM. Subsequently, the problem of a fixed-wing aerial vehicle in a 3-D environment is studied.

Vi-a 2-D Double Integrator

A 2-D double integrator is a linear system with a 4-dimensional state space given by its position and velocity, and a 2-dimensional control space given by its acceleration.

The environment considered is shown in Figure 2. There are landmarks placed in the environment, which are represented by black stars. The black polygonal shapes represent the obstacles. The agent observes all landmarks and obtains estimates of its position at all time steps. The agent achieves better position estimates when it is closer to the landmarks. Let the Euclidean distance between the position of the 2-D double integrator and the landmarks be given by . Then, the position measurement corresponding to landmark is


where is a parameter related to the intensity of the position measurement noise that is set to , and is a two-dimensional standard Gaussian random vector. The velocity measurement is given by


where is a parameter related to the intensity of the velocity measurement noise that is set to , and is a two-dimensional standard Gaussian random vector. Thus, the total measurement vector is a dimensional vector, composed of position measurements and one velocity measurement.

The sampled CS-BRM nodes are shown in Figure 2(a). The velocities are restricted to be zero for the purpose of comparison with SLQG-FIRM. The gray dots are the state means of the nodes. The red dot shows the position of the starting node and the blue square shows the position of the goal node. The black ellipses around the state represent the confidence intervals of the position covariances. The red dash ellipses correspond to the prior estimation error covariances. Following Algorithm 2, the edge controllers are computed using the covariance steering controller for each edge and the collision cost is computed using Monte Carlo simulations. The final CS-BRM is given in Figure 2(b).

Fig. 2: (a) Sampled CS-BRM nodes. (b) CS-BRM of the 2-D double integrator. The green lines are the mean trajectories between the nodes. The gray ellipses are the confidence intervals of the covariances of the positions.

After the CS-BRM is built, the path planning problem on CS-BRM is the same as the problem of planning using a PRM, which can be easily solved using a graph search algorithm. The difference between the CS-BRM and PRM is that the edge cost in CS-BRM is specifically designed to deal with dynamical system models and uncertainties, and the transition between two nodes of CS-BRM is achieved using covariance steering as the edge controller.

Fig. 3: Results of sampling stationary belief nodes. (a) Path planned using CS-BRM; (b) Path planned using SLQG-FIRM.

The planning results of CS-BRM and SLQG-FIRM are given in Figure 3. The paths goes through the same set of belief nodes for this example. In Figure 3(b), the red ellipses are the state covariances at the last step of the time-varying LQG controller. As we see, each one of the ellipses is larger than the corresponding state covariance of the intermediate belief node (bold black ellipses). Thus, a converging step is required at every intermediate node. The time-varying LQG controller is switched to the SLQG controller until the state covariance converges to a value that is smaller than the state covariance of the belief nodes (bold black ellipses).

The CS-BRM that samples nonstationary nodes is shown in Figure 4(a). For clarity, only the mean trajectories of the edges are shown (green lines). The environment, the sampled mean positions, and the sampled covariances are all the same as those in Figure 3(a) and 3(b). The only difference is that multiple velocities are sampled at each position in Figure 4(a). The planned path is shown in Figure 4(b). The cost of the planned path is , which is much lower than the cost of the paths in Figure 3(a) and Figure 3(b), which are and respectively. The decrease in the path cost is due to the decrease of the mean control cost. In Figure 3(b), the robot has to stop (zero velocity) at every intermediate node. With the proposed method, the robot goes through each intermediate node smoothly, which results in a more efficient path.

Fig. 4: (a) CS-BRM results of sampling nonstationary nodes. The green lines are the mean trajectories of the edges. (b) Planning results. By exploring the velocity space, the robot does not need to stop at every node. The path cost is much smaller compared to Figures 3(a) and 3(b).

Vi-B Fixed-Wing Aircraft

In this section, we apply the proposed method on a nonlinear example, namely, planning for a fixed-wing aerial vehicle [21]. The discrete-time system model is given by


where the 4-dimensional state space is given by , where is the 3-D position of the vehicle, and is the heading angle. The 3-D control input space is given by , where is the air speed, is the flight-path angle, and is the bank angle, is the time step size, ,

, are standard Gaussian random variables,

, , , and are multipliers correspond to the magnitude of the noise. Their values are all set to be .

Similar to the 2-D double integrator example, several landmarks are placed in the environment. Let the Euclidean distance between the 3D position of the vehicle and the landmarks be given by . Then, the position measurement corresponding to landmark is


where is a parameter related to the intensity of the noise of the measurement and is set to , and is a 4-dimensional standard Gaussian random vector. Thus, the total measurement vector is a -dimensional vector, where is the number of landmarks.

Algorithm 1 was used to find a compatible nominal trajectory for the fixed-wing vehicle. One example is given in Figure 5. The nominal trajectory and the reference trajectory are initialized as straight lines in all four state dimensions and the three control input dimensions. The algorithm converged in eight iterations.

Fig. 5: Generation of a compatible nominal trajectory for the fixed-wing vehicle. The nominal trajectory and the reference trajectory are initialized as straight lines.

The planning environment is shown in Figure 6. The four spheres in the middle of the environment are the obstacles. The three black stars represent the landmarks. The black circles represent the state mean (position) of the nodes. The red dot shows the position of the starting node and the blue square shows the position of the goal node. The state means are deterministically chosen to discretize the environment. The covariances of the nodes are not showed.

Fig. 6: Planning results using the CS-BRM for the fixed-wing vehicle. The gray lines are the trajectories from the Monte-Carlo simulations. The algorithm finds the optimal path that trades off between control effort and collision cost.

The planned path using this CS-BRM is also shown in Figure 6. Similarly to the example in Section VI-A, instead of flying through the narrow passage between the four obstacles, which resulting a high probability of collision, the algorithm choose a path that trades off between control cost and collision cost while minimizing the total edge costs.

Vii Conclusion

A belief space roadmap (BRM) algorithm is developed in this paper. The nodes in BRM represent distributions of the state of the system and are sampled in the belief space. The main idea is to use covariance steering to design the edge controllers of the BRM graph to steer the system from one distribution to another. Compared to [3], the proposed method allows sampling nonstationary belief nodes, which has the advantage of more complete exploration of the belief space. For covariance steering of nonlinear systems, we introduce the concept of compatible nominal trajectories, which aim to better approximate the nonlinear dynamics through successive linearization. We also propose an efficient algorithm to compute compatible nominal trajectories. Compared to the standard PRM, the additional computation load comes from the computation of the edge controllers and edge cost evaluations, which, however, are done offline. By explicitly incorporating motion and observation uncertainties, we show that the proposed CS-BRM algorithm generates efficient motion plans that take into account both the control effort and collision probability.


This work has been supported by NSF awards IIS-1617630 and IIS-2008695, NASA NSTRF Fellowship 80NSSC17K0093, and ARL under CRA DCIST W911NF-17-2-0181. The authors would also like to thank Dipankar Maity for many insightful discussions regarding the output-feedback covariance steering problem.


  • [1] A. A. Agha-mohammadi, S. Agarwal, S. K. Kim, S. Chakravorty, and N. M. Amato (2018) SLAP: simultaneous localization and planning under uncertainty via dynamic replanning in belief space. IEEE Transactions on Robotics 34 (5), pp. 1195–1214. Cited by: §I.
  • [2] A. A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato (2012) Periodic feedback motion planning in belief space for non-holonomic and/or nonstoppable robots. Technical Report: TR12-003, Parasol Lab., CSE Dept., Texas A&M University, pp. 1–23. Cited by: §I.
  • [3] A. A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato (2014) FIRM: sampling-based feedback motion-planning under motion uncertainty and imperfect measurements. The International Journal of Robotics Research 33 (2), pp. 268–304. Cited by: §I, §I, §I, §II, §V, §VI, §VII.
  • [4] R. Alterovitz, T. Siméon, and K. Y. Goldberg (2007) The stochastic motion roadmap: a sampling framework for planning with Markov motion uncertainty.. In Robotics: Science and systems, pp. 233–241. Cited by: §I.
  • [5] E. Bakolas (2016-12) Optimal covariance control for discrete-time stochastic linear systems subject to constraints. In IEEE Conference on Decision and Control, Las Vegas, NV, pp. 1153–1158. Cited by: §I.
  • [6] E. Bakolas (2017-05) Covariance control for discrete-time stochastic linear systems with incomplete state information. In American Control Conference, Seattle, WA, pp. 432–437. Cited by: §I.
  • [7] J. T. Betts (1998) Survey of numerical methods for trajectory optimization. Journal of Guidance, Control, and Dynamics 21 (2), pp. 193–207. Cited by: §I.
  • [8] B. Bonet and H. Geffner (2000-04)

    Planning with incomplete information as heuristic search in belief space


    International Conference on Artificial Intelligence Planning Systems

    Breckenridge, CO, pp. 52–61. Cited by: §I.
  • [9] S. D. Bopardikar, B. Englot, A. Speranzon, and J. van den Berg (2016)

    Robust belief space planning under intermittent sensing via a maximum eigenvalue-based bound

    The International Journal of Robotics Research 35 (13), pp. 1609–1626. Cited by: §I.
  • [10] A. Bry and N. Roy (2011) Rapidly-exploring random belief trees for motion planning under uncertainty. In 2011 IEEE international conference on robotics and automation, pp. 723–730. Cited by: §I, §V, §V.
  • [11] A. E. Bryson and Y. C. Ho (1975) Applied optimal control: optimization, estimation and control. CRC Press. Cited by: §III-A.
  • [12] Y. Chen, T. Georgiou, and M. Pavon (2015-12) Steering state statistics with output feedback. In IEEE Conference on Decision and Control, Osaka, Japan, pp. 6502–6507. Cited by: §I.
  • [13] Y. Chen, T. T. Georgiou, and M. Pavon (2015) Optimal steering of a linear stochastic system to a final probability distribution, Part I. IEEE Transactions on Automatic Control 61, pp. 1158–1169. Cited by: §I.
  • [14] Y. Chen, T. T. Georgiou, and M. Pavon (2015) Optimal steering of a linear stochastic system to a final probability distribution, part ii. IEEE Transactions on Automatic Control 61 (5), pp. 1170–1180. Cited by: §I.
  • [15] M. Goldshtein and P. Tsiotras (2017-12) Finite-horizon covariance control of linear time-varying systems. In IEEE Conference on Decision and Control, Melbourne, Australia, pp. 3606–3611. Cited by: §I.
  • [16] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, pp. 99–134. Cited by: §I.
  • [17] S. Karaman and E. Frazzoli (2011-06) Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research 30 (7), pp. 846–894. Cited by: §I.
  • [18] L. E. Kavraki, P. Svestka, J. C. Latombe, and M. H. Overmars (1996) Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation 12 (4), pp. 566–580. Cited by: §I.
  • [19] K. Okamoto, M. Goldshtein, and P. Tsiotras (2018) Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters 2 (2), pp. 266–271. Cited by: §I, §III-C, §III.
  • [20] S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee (2010) Planning under uncertainty for robotic tasks with mixed observability. The International Journal of Robotics Research 29 (8), pp. 1053–1068. Cited by: §I.
  • [21] M. Owen, R.W. Beard, and T.W. McLain (2015) Implementing Dubins airplane paths on fixed-wing UAVs. In Handbook of Unmanned Aerial Vehicles, K. Valavanis and G. Vachtsevanos (Eds.), pp. 1677–1701. Cited by: §VI-B.
  • [22] J. Pineau, G. Gordon, and S. Thrun (2003-08) Point-based value iteration: an anytime algorithm for POMDPs. In International joint conference on artificial intelligence, Vol. 3, Acapulco, Mexico, pp. 1025–1032. Cited by: §I.
  • [23] S. Prentice and N. Roy (2009) The belief roadmap: efficient planning in belief space by factoring the covariance. The International Journal of Robotics Research 28, pp. 1448–1465. Cited by: §I, §I, §I, §II.
  • [24] J. Ridderhof, K. Okamoto, and P. Tsiotras (2019-12) Nonlinear uncertainty control with iterative covariance steering. In IEEE Conference on Decision and Control, Nice, France, pp. 3484–3490. Cited by: §I.
  • [25] J. Ridderhof, K. Okamoto, and P. Tsiotras (2020-Dec. 8–11) Chance constrained covariance control for linear stochastic systems with output feedback. In IEEE Conference on Decision and Control, Jeju Island, South Korea, pp. 1153–1158. Cited by: §I, §III-A, §III-C, §III.
  • [26] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa (2008) Online planning algorithms for POMDP. Journal of Artificial Intelligence Research 32, pp. 663–704. Cited by: §I.
  • [27] G. Shani, J. Pineau, and R. Kaplow (2013) A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27 (1), pp. 1–51. Cited by: §I.
  • [28] A. Somani, N. Ye, D. Hsu, and W. S. Lee (2013) DESPOT: online POMDP planning with regularization. Advances in neural information processing systems 26, pp. 1772–1780. Cited by: