I Introduction
Motion planning is an indispensable skill for robots that aspire to navigate through an environment without collisions. Motion planning algorithms attempt to generate trajectories through the robot’s configuration space that are both feasible and optimal based on some performance criterion dependent on the task, robot or environment. Algorithms that can be executed in real time are highly encouraged, mostly because they allow fast replanning in response to environment changes. The majority of methods in the domain of highdimensional motion planning can be roughly divided into two categories: samplingbased approaches and trajectory optimization approaches.
The central tenet of samplingbased approaches [1, 2, 3] is the idea of connecting points randomly sampled from the free configuration space. Due to the underlying random sampling, these approaches exhibit probabilistic completness and fast exploration of the environment. However, sampling based planners can be computationally inefficient for highdimensional problems with challenging constraints and often require a postprocessing step to smooth and shorten the computed trajectories. Furthermore, considerable computational effort is spent on exploring the portions of the configuration space that might not be relevant to the task.
A significant amount of recent work has focused on trajectory optimization and related problems. Trajectory optimization methods start with an initial trajectory and then minimize an objective function in order to optimize the trajectory. Covariant Hamiltonian optimization for motion planning (CHOMP) [4, 5] is the seminal work in modern trajectory optimization. It utilizes a precomputed signed distance field for fast collision checking and uses covariant gradient descent to minimize obstacle and smoothness costs. Stochastic trajectory optimization for motion planning (STOMP) algorithm [6] samples a series of noisy trajectories to explore the space around an initial trajectory which are then combined to produce an updated trajectory with lower cost. The key trait of STOMP is its ability to optimize nondifferentiable constraints. An important shortcoming of CHOMP and STOMP is the need for many trajectory states for reasoning about fine resolution obstacle representations and finding feasible solutions when there are many constraints. TrajOpt [7], [8] algorithm formulates motion planning as sequential quadratic programming. The key feature of TrajOpt is the ability to solve complex motion planning problems with few states since swept volumes are considered to ensure continuoustime safety. However, if the smoothness is required in the output trajectory, either a densely parametrized trajectory or postprocessing of the trajectory might still be needed thus increasing computation time.
The Gaussian process (GP) motion planning family of algorithms [9, 10, 11, 12] employs continoustime trajectory representation in order to overcome the computational cost incurred by using large number of states. The GPMP algorithm [9]
parametrizes the trajectory with a few support states and then uses GP interpolation to query the trajectory at any time of interest. The GPMP2 algorithm
[10] represents trajectories as samples from a continuoustime GP and then formulates the planning problem as probabilistic inference. It exploits the sparsity of the underlying system by using preexisting optimization tools developed by the simultaneous localization and mapping (SLAM) community [13] to generate fast solutions.Altough trajectory optimization methods generate fast solutions in highdimensional spaces, they have limited exploration ability and in complex environments often converge to the infeasible local minima. In this paper, we propose a gradientfree stochastic optimization method for trajectory planning with continous time GP trajectory representations. We consider a trajectory as a sample from a GP and introduce heteroscedasticity to generate powerful trajectory priors better suited for collision avoidance in motion planning problems. The proposed optimization method relies on importance sampling and is a derivative of the crossentropy optimization method [14]. While our method belongs to the trajectory optimization approaches, it relies on random trajectory samples which raises a connection to the sampling based planning. The proposed method is an example of bridging the gap between sampling based and trajectory optimization approaches in order to generate fast solutions in high dimensional spaces while retaining the ability to throughly explore the environment. We evaluated our method in simulations and compared it to GPMP2 – a stateoftheart gradientbased, in constrast to the proposed gradientfree, trajectory optimization method. The results show that the proposed method yields a higher success rate in complex environments with comparable execution time.
Ii Heteroscedastic Gaussian Processes for Motion Planning
Iia The Gaussian Process Trajectory Representations
Consider a continuoustime trajectory as a sample from a vectorvalued continuoustime Gaussian process (GP)
(1) 
that is parameterized with support states at discrete time instants, , , where is the state dimensionality. We employ a structured kernel belonging to a special class of GP priors generated by a linear timevarying stochastic differential equation (LTVSDE)
(2) 
where and are system matrices and is a known exogenous input. The white noise process is itself a GP with zero mean value
(3) 
where is an isotropic timevarying powerspectral density matrix,
. A similar dynamical system has been utilized in estimation
[15, 16], calibration [17] and planning [12, 18]. However, the crucial difference in our approach is that the covariance is timevarying and consequently generates a heteroscedastic GP [19]. We discuss benefits of this approach in Section IIE.IiB GP Prior for Motion Planning
Due to Markov property of the LTVSDE in (2), the inverse kernel matrix is exactly sparse block tridiagonal [15]:
(6) 
where
(7) 
and
(8) 
with
(9) 
The GP defined by mean and covariance in (4) and (5) is well suited for estimation problems. However, in motion planning problems there exists a desired fixed goal state. Given that, we need to condition this GP with a fictitious observation on the goal state with mean and covariance . This can be accomplished while still preserving the sparsity of the kernel matrix [12]
(10) 
The mean vector and the kernel matrix given in (10) fully determine a continuoustime trajectory defined by (1) that we employ for motion planning.
IiC Fast GP interpolation
A major benefit of modelling continuoustime trajectory in motion planning with GPs is the possibility to query the planned state at any time of interest , and not only at discrete time instants. The kernel matrix defined in (10) allows for computationally efficient, structureexploiting GP interpolation with complexity. State at is a function only of its neighboring states [10]
(11) 
(12) 
(13) 
where is given in (9). Efficient GP interpolation can be exploited for reasoning about small obstacles while keeping a relatively small number of support states which reduces the incurred computational burden. It can also be utilized for providing a dense output trajectory that a robot can execute without any postprocessing.
IiD Constantvelocity motion model
Robot dynamics are represented with the double integrator linear system with white noise injected in acceleration. The trajectory is thus generated by the LTVSDE (2), where the Markovian state consists of position and velocity in configuration space with the following system matrices
(14) 
This formulation generates a constant velocity GP prior which is centered around a zero acceleration trajectory. Applying this motion model minimizes actuator acceleration in the configuration space, thus minimizing the energy consumption and providing the physical meaning of smoothness [12].
IiE Benefits of heteroscedasticity
Stateoftheart methods based on GPs [10, 11, 12] minimize the sum of two costs: an obstacle cost and a smoothness cost which measures the deviation of the trajectory from the GP prior mean. Those methods use covariance as a parameter in optimization, with smaller values of penalizing the deviation of the trajectory from the prior mean more. For therein employed homoscedastic GPs, small constant values of
allow high variance of states near the midpoint of trajectory, while states close to the trajectory start or goal have relatively small variance. If the prior mean of those states passes through an obstacle, the LevenbergMarquardt optimization technique used in
[10, 11, 12] will have difficulty escaping the collision since the incurred smoothness cost can become too large commensurate with obstacle cost. Figure (a)a shows that sample trajectories drawn from the homoscedastic GP do not have large deviation from the mean near the first and the last state. With larger constant values of this problem diminishes, however, states near the trajectory midpoint then have high variance and trajectories lose the desirable smoothness property.By introducing heteroscedasticity of the underlying white noise process, we can design GPs that are better suited for motion planning and help alleviate the problem of local minima. Ideally, the GP should be able to generate trajectories that maneuver around the obstacles near start and goal states, while retaining the smoothness property in the middle. With careful selection of the proposed timevarying white noise powerspectral density matrix , we are able to model GPs that achieve the stated goal. In our experience, modeling as a parabola with high values at the beginning and end, and the lowest value at the temporal midpoint of the trajectory, leads to GPs that have these desirable features. Note that the GP covariance is obtained by propagating with the underlying motion model, as defined in (6), and thus low (or even zero) value of at a particular time instant does not imply small GP covariance. An example of such is depicted in Figure (c)c and it leads to a heteroscedastic GP shown in Figure (b)b. Notably, sample trajectories drawn from the example heteroscedastic GP have large deviation from the mean near the first and the last state. The example GP would be able to thoroughly explore the environment and generate trajectories that bypass obstacles near the first and the last state. Note that any nonnegative function can be used for , depending on the specific context of some motion planning problem. For example, one could model as a monotonically decreasing function, resembling the aim of exploring more at the beginning and less towards the end.
Iii Proposed Stochastic Trajectory Optimization
Formally, the goal of trajectory optimization is to find a smooth, collisionfree trajectory through the configuration space between two end points. Prior work in this area models the cost of a trajectory using two terms: a prior term, which usually encodes smoothness that minimizes higherorder derivatives of the robot states, and an arbitrary statedependent term, which usually measures the cost of being near obstacles. However, our optimization criteria consists solely of an arbitrary statedependent cost term. We reason that our trajectory carries an inherent property of smoothness since we model it as a sample from the GP defined in (1). Therefore, our method starts with the following optimization problem:
(15)  
subject to 
The statedependent cost term can include any cost function corresponding to the desired trajectory properties, e.g. collision avoidance, taskspace constraints, torques [6] and manipulability [20]. In this work, we consider only collision avoidance and use a precomputed signed distance field for collision checking similarly to [10, 12].
Most of the stateoftheart approaches use gradientbased methods which find locally optimal trajectories. In this work, we instead optimize using a derivativefree stochastic optimization method. This allows for better exploration while being less prone to local minima. It also enables optimization of arbitrary costs which are nondifferentiable or nonsmooth. To solve (15), we employ a stochastic optimization approach stemming from the crossentropy method [14] and with similarities to the estimationofdistribution algorithm [21].
Our method starts with drawing sample trajectories from the GP defined in (1), where the mean is initialized as a constantvelocity straight line in configuration space and covariance matrix arises from the kernel matrix defined in (10). A sample trajectory is generated using
(16) 
where is a lower triangular matrix obtained by Cholesky decomposition of the covariance matrix, , and is a vector of standard normal variables .
Subsequently, we evaluate the cost for each trajectory using the aforementioned hinge loss with a precomputed signed distance field. From the evaluated trajectories we pick best ones according to the optimization criteria, i.e. ones with the lowest cost . We then take the costweighted average (weighted arithmetic mean) of those best trajectories in order to form a GP mean for next iteration:
(17) 
where
(18) 
This process is repeated until a collisionfree trajectory is found. The described method is summarized in Algorithm 1.
Contrary to a generic crossentropy optimization algorithm, in our approach the covariance matrix remains unchanged through iterations. This is done because computing the new covariance matrix of the same form would significantly increase the computational burden. Furthermore, the unchanged covariance matrix allows for exhaustive exploration around the mean in each iteration. Changing only mean while keeping the covariance matrix unchanged is permitted in the GP framework described in Section II, as change in can be attributed to some implicitly imposed exogenous input which does not impact covariance.
Iiia Computational Efficiency Remarks
In order to throughly explore the environment, our approach requires cost evaluation for relatively many drawn trajectory samples, which naturally leads to the slower computation than the stateoftheart gradient based methods. However, due to the fact that cost evaluation for each trajectory is independent, the inner for loop in the proposed Algorithm 1 can be parallelized with computational efficiency scaling linearly with the number of processing cores. In our implementation, we exploit this property and parallelize the inner loop on 4 processing cores. A GPU implementation presents an interesting possibility that would allow sampling and cost evaluation for a huge number of trajectories, leading to fast environment exploration and discovering optimal trajectories allowing realtime replanning in dynamic environments [22].
We use GP interpolation for dense collision checking, similarly to [10]. Since trajectory support states are temporally equidistant and each sampled trajectory is drawn from the same GP, matrices and in (12) and (13) can be precomputed, instead of computing them each time interpolation is needed. This provides another significant increase of the proposed method computational efficiency.
Iv Test Results
We tested the proposed method on two simulation benchmarks and compared it with the stateoftheart trajectory optimization technique GPMP2 [10]. In Section IVA, we quantitatively demonstrate the improvement of the proposed method over GPMP2 with random restarts in solving a 2D maze, which is a good benchmark for an optimizationbased planner effectiveness at finding a collisionfree solution in a haystack of local minima. This experiment aims to show benefits of the proposed stochastic method, which allows for better exploration in comparison to gradientbased methods. In Section IVB, we demonstrate the improvement of the proposed method over prior techniques in finding a collisionfree trajectory for a DOF manipulator in cluttered environment. This experiment aims to show benefits of the proposed heteroscedastic prior since the environment was set up so that obstacles are placed near the start and goal state.
In both benchmarks, our method was always initialized with a constantvelocity straight line trajectory in the configuration space. For GPMP2, we used a straight line initialization as a baseline, and in our experiments we designate to this model as line. Since GPMP2 always converges very quickly, but often fails in cluttered environments due to infeasible local minima, we also employed random restarts, which is a commonly used method to tackle the local minima problems in gradientbased trajectory optimization methods [4]. In this technique, the optimizer is first initialized with a straightline and, on failure, reinitialized with a random trajectory. Our implementation samples the random restart trajectory from a homoscedastic GP, similarly to [11]. We designate to this model as rr.
We used the GPMP2 C++ library [10, 23] and its respective MATLAB toolbox based on the GTSAM C++ library [13]. Experiments were performed on a system with a 2.8GHz Intel Core i77700HQ processor and 16 GB of RAM.
Iva The Maze Benchmark
The maze benchmark, appropriate for quantitative evaluation, consisted of 1000 synthetic environments created by the Wilson’s algorithm [24], which generates uniformly sampled mazes with a single solution (i.e. perfect mazes). Mazes were generated on grids with sizes of , and and afterwards inflated to realistic dimensions. While the 2D maze problem is generally suitable for gridbased or samplingbased motion planning approaches which achieve success rate, it can be used to measure an optimizationbased planner’s effectiveness at finding the unique collisionfree solution in a cluttered environment.
For each maze environment, we plan motion for a 2D holonomic circular robot with the radius of m. For our method, we chose the number of sampled trajectories , while the number of best trajectories chosen for the weighted mean in each iteration was set to . Altough these parameters may seem disproportionate, choosing a large ensures exhaustive exploration, while small induces drastic changes in the GP mean between iterations, which helps in finding the solution faster in complex environments. We set the total trajectory time (i.e. the timespan in which robot moves from start to goal state) to s, while timevarying covariance matrix of the white noise governing the heteroscedastic GP was calculated as , which generates a parabola with its vertex at the midpoint of the trajectory. For GPMP2 we used the parameters set from the Matlab toolbox 2D example. For both methods, the trajectory was parametrized with support states and interpolation steps inbetween for which the trajectory cost is evaluated. We set the maximum runtime for our algorithm and random restarts as s, with one exception where we set s in order to investigate the ability of our algorithm to find solutions given more time. We measure the number of mazes solved (success rate) and the execution time. The reults of the experiment are shown in Table I.
While the GPMP2 without random restarts has an order of magnitude faster execution time, it has the worst success rate for every maze complexity level. For the least complex mazes, created from a grid, the random restarts outperformed our algorithm, managing similarly high success rate with significantly faster computation. However, for the mazes created from a grid our algorithm outperformed random restarts, having notably higher success rate with similar reported times. Note that the reported execution time for our algorithm is the actual time it took to compute, and not the sum over all cores. An example of a maze is shown in Figure 5. The most complex mazes created from a grid demonstrated the inability of gradientbased methods to find solutions in complex environments plagued with multitude of local minima.
IvB The Robot Arm Planning Benchmark
The robot arm planning benchmark consisted of a simulated WAM robotic arm in an environment featuring a table and a drawer. We conducted unique experiments, all with different start and goal states with starting points being under the table and end states being inside the drawer. This set of problems is not particularly difficult since most of the states are initially collision free, however it was set up to accentuate the proneness of the homoscedastic GP planning methods to get stuck in local minima near start or goal states.
For our method, we chose the number of sampled trajectories , while the number of best trajectories chosen for the weighted mean in each iteration was set as . In this benchmark we tested our optimization method with both heteroscedastic and homoscedastic GP priors in order to demonstrate the benefits of heteroscedasticity. We set the total trajectory time s. For homoscedastic case we chose , while for a heteroscedastic GP we calculated , similarly to the maze benchmark. For GPMP2 we used the default parameters set up in the Matlab toolbox WAM planner example. For both methods the trajectory was parametrized with equidistant support states and interpolation steps inbetween for which the trajectory cost was evaluated. We again set the fixed time budget for our algorithm and random restarts as s. We again measured the success rate and the execution time. The results of the experiment are shown in Table I, where we can see that while the baseline GPMP2 fails in most cases, the stochasticity introduced by random restarts helps in achieving higher success rates. The proposed method achieved a perfect score within the fixed time budget, thus demonstrating the advantage of the proposed heteroscedastic prior.
V Conclusion
In this paper we have presented a stochastic trajectory optimization method for motion planning. We considered each trajectory as a sample from a continuous time GP generated by a linear timevarying stochastic differential equation. By introducing the heteroscedasticity of the underlying GP, we were able to generate trajectory priors better suited for collision avoidance in motion planning problems. We proposed a crossentropy method based derivativefree optimization in order to contend with the local minima problem present in trajectory optimization methods. Through simulated experiments we demonstrated that the proposed approach ameliorated the local minima problem present in trajetory optimization approaches while having comparable execution time.
In future work, it would be interesting to exploit the parallelization capability of our algorithm with a GPU implementation. Furthermore, the strong exploration capability could be used for finding homotopy classes in the environment.
References
 [1] L. E. Kavraki, P. Svestka, J.C. Latombe, and M. H. Overmars, “Probabilistic roadmaps for path planning in highdimensional configuration spaces,” IEEE transactions on Robotics and Automation, vol. 12, no. 4, pp. 566–580, 1996.
 [2] S. M. LaValle, Planning algorithms. Cambridge university press, 2006.
 [3] S. Karaman and E. Frazzoli, “Samplingbased algorithms for optimal motion planning,” The International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, 2011.
 [4] N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa, “Chomp: Gradient optimization techniques for efficient motion planning,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2009, pp. 489–494.
 [5] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” The International Journal of Robotics Research, vol. 32, no. 910, pp. 1164–1193, 2013.
 [6] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “Stomp:stochastic trajectory optimization for motion planning,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2011, pp. 4569–4574.
 [7] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collisionfree trajectories with sequential convex optimization.” in Robotics: Science and Systems, vol. 9, 2013.
 [8] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, 2014.
 [9] M. Mukadam, X. Yan, and B. Boots, “Gaussian process motion planning,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2016, pp. 9–15.
 [10] J. Dong, M. Mukadam, F. Dellaert, and B. Boots, “Motion planning as probabilistic inference using gaussian processes and factor graphs,” in Robotics: Science and Systems, vol. 12, 2016.
 [11] E. Huang, M. Mukadam, Z. Liu, and B. Boots, “Motion planning with graphbased trajectories and gaussian process inference,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2017, pp. 5591–5598.
 [12] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuoustime gaussian process motion planning via probabilistic inference,” arXiv preprint arXiv:1707.07383, 2017.
 [13] F. Dellaert, “Factor graphs and gtsam: A handson introduction,” Georgia Institute of Technology, Tech. Rep., 2012.

[14]
R. Y. Rubinstein and D. P. Kroese,
The crossentropy method: a unified approach to combinatorial optimization, MonteCarlo simulation and machine learning
. Springer Science & Business Media, 2013.  [15] T. D. Barfoot, C. H. Tong, and S. Särkkä, “Batch continuoustime trajectory estimation as exactly sparse gaussian process regression.” in Robotics: Science and Systems, vol. 10, 2014.
 [16] S. Anderson, T. D. Barfoot, C. H. Tong, and S. Särkkä, “Batch nonlinear continuoustime trajectory estimation as exactly sparse gaussian process regression,” Autonomous Robots, vol. 39, no. 3, pp. 221–238, 2015.
 [17] J. Persic, L. Petrovic, I. Markovic, and I. Petrovic, “Spatiotemporal multisensor calibration based on gaussian processes moving object tracking,” arXiv preprint arXiv:1904.04187, 2019.
 [18] M. Mukadam, J. Dong, F. Dellaert, and B. Boots, “Simultaneous trajectory estimation and planning via probabilistic inference,” in Robotics: Science and Systems, 2017.
 [19] M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, and B. Boots, “Towards robust skill generalization: Unifying learning from demonstration and motion planning,” in Conference on Robot Learning, 2017, pp. 109–118.
 [20] F. Marić, O. Limoyo, L. Petrović, I. Petrović, and J. Kelly, “Manipulability maximization using continuoustime gaussian processes,” arXiv preprint arXiv:1803.09493, 2018.

[21]
P. Larrañaga and J. A. Lozano,
Estimation of distribution algorithms: A new tool for evolutionary computation
. Springer Science & Business Media, 2001, vol. 2.  [22] C. Park, J. Pan, and D. Manocha, “Realtime optimizationbased planning in dynamic environments using gpus,” in 2013 IEEE Int. Conf. on Robotics and Automation. IEEE, 2013, pp. 4090–4097.
 [23] J. Dong, B. Boots, and F. Dellaert, “Sparse gaussian processes for continuoustime trajectory estimation on matrix lie groups,” arXiv preprint arXiv:1705.06020, 2017.
 [24] D. B. Wilson, “Generating random spanning trees more quickly than the cover time,” in STOC, vol. 96. Citeseer, 1996, pp. 296–303.