. SMPs ensure probabilistic completeness, which implies that a probability of finding a feasible path solution, if one exists, approaches to one as the limit of the number of randomly drawn samples from an obstacle-free space increases to infinity. However, despite their ability to compute motion plans irrespective of the obstacles geometry, these methods exhibit slow convergence to computing path solutions due to their reliance on the extensive exploration of a given obstacle-free configuration space  . Recent research shows that biasing a sample distribution towards the region with high probability of finding a path solution can considerably enhance the performance of classical single-query SMPs such as RRT and RRT* . To the best of our knowledge, there does not exist any effective and reliable solution that uses the knowledge from the past planning problems to bias the sample distributions towards the region of the configuration space containing an optimal path solution.
In this paper, we propose a neural network-based adaptive sampler that generates samples in particular regions of a configuration space where there is likely to exist an optimal path solution. Our method consists of two neural models, i.e., an obstacle-space encoder and random samples generator. We use a Contractive AutoEncoder (CAE)  for the encoding of an obstacle-space into an invariant, robust feature space. A samples generator that comprises a Dropout-based 
stochastic Deep Neural Network (DNN) that takes the obstacle-space encoding, start and goal configuration as an input, and generates samples distributing over the region of configuration space containing the path solutions. We evaluate our method on various complex motion planning tasks such as planning of a rigid-body (piano-mover problem) and 6 degree-of-freedom (DOF) robotic arm (UR6), and planning through narrow passages. We also benchmark our method against existing biased-sampling based state-of-the-art SMPs including Informed-RRT* and Batch Informed Trees (BIT*) . The results show that our algorithm generates samples that enable unbiased SMPs such as RRT* to compute near-optimal paths in a considerably lesser computational time than BIT* and Informed-RRT*.
Ii Related Work
Many biased-sampling heuristics have been proposed to enhance the computational speed of RRT  and its variants. For instance, Rickert et al.  used gradient information to balance exploration and exploitation. Urmson and Simmons method  heuristically biased samples in RRT while Ferguson and Stentz  presented the anytime RRT algorithm by using multiple independent RRTs. Although these methods are useful, they lack asymptotic optimality due to the underlying RRT algorithm.
RRT*  extends RRTs to guarantee asymptotic optimality by incrementally rewiring the RRT graph connections such that the shortest path is asymptotically guaranteed . However, to determine an -near optimal path in dimensions, roughly samples are required, which makes RRT* no better than grid search methods . Likewise, experiments in   also confirmed that RRT* exhibits slow convergence rates to optimal path solution in higher-dimensional spaces. The following sections discusses various existing biased/adaptive sampling methods to speed up the convergence rate of SMPs to compute optimal/near-optimal path solution.
Ii-a Adaptive Sampling Methods
Gammell et al.  proposed the Informed-RRT* algorithm which takes an initial solution from RRT* algorithm to define an ellipsoidal region from which new samples are drawn to minimize the initial solution for a given cost function. Although Informed-RRT* demonstrated enhanced convergence towards an optimal solution, this method suffers in situations where finding an initial path solution takes most of the computation time. To address this limitation, Gammell et al. proposed Batch Informed Trees (BIT*) . BIT* is an incremental graph search technique where an ellipsoidal subset, containing configurations to update the graph, is incrementally enlarged. BIT* is shown empirically to outperform prior methods such as RRT* and Informed-RRT*. However, confining a graph search to ellipsoidal region slows down the performance of an algorithm in maze-like scenarios especially where the start and goal configurations are very close to each other, but the path among them traverses a complicated maze stretching waypoints far away from the goal. Furthermore, such a method would not translate to non-stationary environments or unseen environments.
Ii-B Learning-based Search Methods
Many approaches exist that use learning to improve classical SMPs computationally. A recent method called a Lightning Framework  stored paths into a lookup table and used a learned heuristic to write new paths as well as to read and repair old paths. Another similar framework by Coleman et al.  is an experience-based strategy to cache experiences in a graph instead of individual trajectories. Although these approaches exhibit superior performance in higher-dimensional spaces when compared to conventional planning methods, lookup tables are memory inefficient and incapable of generalizing well to new planning problems. Zucker et al. 
proposed a reinforcement learning-based method to bias samples in discretized workspaces. However, reinforcement learning-based approaches are known for their slow convergence as they require a large number of interactive experiences.
Iii Problem Definition
This section presents the notations we will be using in this paper, along with the definitions of fundamental motion planning problems addressed by our work.
Let be a list of finite length then is a mapping from a given index to an element of at -th index. For algorithms described in our paper, and corresponds to the initial and last elements of a list, respectively. Let a given state space be denoted as , where denotes the dimension of a state space. The collision and collision-free state spaces are denoted as and , respectively. Let the initial state and goal region be represented as and , respectively. Let a trajectory be denoted as a non-empty finite-length list . For a given path planning problem, a trajectory is said to be feasible if it connects and , i.e. and , and a path formed by connecting all consecutive states in lies entirely in the obstacle-free space i.e.,
Problem 1 (Feasible Path Planning) Given a triplet , an initial state and a goal region , find a path such that and .
Let a cost function computes a cost of a given path in terms of a summation of Euclidean distances between all the consecutive states in . Let a set of all feasible path solutions to a given planning problem be denoted as . The optimality problem of motion planning is then to find the optimal, feasible, path solution that has a minimum cost among all other feasible path solutions i.e.,
Problem 2 (Optimal Path Planning) Assuming that multiple solutions to Problem 1 exists, find a path such that .
Let be a potential region containing optimal/near-optimal path solution. The problem of adaptive sampling, also known as biased sampling, is to generate collision-free samples such that SMPs compute the optimal path in a least possible time . The problem of adaptive sampling is formalized as follow.
Problem 3 (Adaptive Sampling) Given a planning problem , generate samples , where , such that the sampling-based motion planning methods compute optimal path solution in a least-possible time .
Iv Informed Neural Sampler
This section presents our novel informed neural sampling algorithm called DeepSMP111Supplementary material is available at sites.google.com/view/deepsmp. It comprises two neural modules. The first module is an autoencoder which learns an invariant and robust feature space to embed a point cloud data from obstacle space. The second module is a stochastic DNN which takes obstacles encoding, start and goal configurations to generate samples incrementally for SMPs during online execution. Note that any SMP can utilize these informed samples for rapid convergence to the optimal solution and that the method works for unseen environments via the obstacle space encoding. The following sections describe both neural modules, online sample generation heuristic called DeepSMP, dataset collection, and hyper-parameters initialization.
Iv-a Obstacle Encoding
A Contractive AutoEncoder (CAE) is used to learn a latent-space embedding of a raw point cloud data (see Fig. 1 (a)). The encoder and decoder functions of CAE are denoted as and , respectively, where and
are parameters of their corresponding approximating functions. CAE is trained through unsupervised learning using the following objective function.
where is a reconstruction loss, and is a regularization term with a coefficient . Furthermore, contains a dataset of point clouds from different workspaces. The regularization term allows the feature space to be contractive in the neighborhood of the training data which results in an invariant and robust feature learning .
Iv-A1 Model Architecture
Since the decoding function is an inverse of encoding function , we present the architectural details of encoding unit only.
The encoding function consists of three fully-connected linear hidden layers followed by an output linear layer. The output from each hidden layer is passed through a Parametric Rectified Linear Unit (PReLU).
For 2D workspaces, the input point cloud is of size where three hidden layers transform the inputs to 512, 256 and 128 units, respectively. The output layer takes 128 units and transforms them to latent space embedding of size 28 units. The decoding function takes the latent space embedding Z to reconstruct the raw point cloud data.
For 3D workspaces, the hidden layers 1, 2 and 3 transform the input point cloud to 786, 512, and 256 hidden units, respectively. Finally, the output layer transforms the 256 units from preceding hidden layer to a latent space of size 60 units.
Iv-B Deep Sampler
Deep Sampler is a stochastic feedforward deep neural network with parameters . It takes obstacles encoding , robot state at step , and goal state to produce a next state that would take a robot closer to the goal region (see Fig. 1(b)) i.e.,
We use RRT*  to produce near-optimal paths to train DeepSMP. The training paths are in the form of a tuple i.e., , such that the path formed by connecting all following states in is a feasible solution. The training objective is to minimize mean-squared-error (MSE) between the predicted states and the actual states given by RRT*, i.e.,
where corresponds to the total number of paths times their path lengths.
Iv-B1 Model Architecture
Deep Sampler is a twelve-layer deep neural network where each hidden layer is a sandwich of a linear layer, PReLU  and Dropout ()  with an exception of last hidden layer which does not contain Dropout (). The twelveth layer is an output layer which takes hidden units from preceding layer and transforms them to the desired output size which is equal to the dimension of robot configurations. The configurations for the 2D point-mass robot, 3D point-mass robot, rigid-body and 6 DOF robot have dimensions 2, 3, 3 and 6 respectively. For all presented problems, except planning of 6 DOF robot, the input to Deep Sampler is given by concatenating the obstacles’ representation , robot’s current state and goal state . For 6 DOF, we assume a single environment, therefore, the input to Deep Sampler comprises of current state and goal state only.
Iv-C Online Execution of DeepSMP
During the online phase, we use our trained obstacle encoder and DeepSampler to generate random samples for a given SMP. Fig. 1 shows the flow of information between encoder and DeepSampler. Algorithm 1 outlines DeepSMP which combines our informed neural sampler with any classical SMP such RRT*.
Algorithm 1 starts by initializing a given SMP (Line 1). The obstacles encoder provides an encoding of a raw point cloud data from (Line 3). DeepSMP algorithm runs for iterations (Line 4). DeepSampler incrementally generates samples between given start and goal configurations until (Line 5-6), where . Upon reaching a given goal configuration, DeepSampler is executed again to produce samples from a given start configuration to the goal configuration by re-initializing random sample to (Lines 10-11). After iterations, DeepSMP switches to random sampling (Line 7-8) to ensure completeness guarantees of an underlying SMP. Note that is a feasible path solution returned by SMP. The path is continually optimized for a given cost function for a given number of iteration . Finally, after iterations, a feasible, optimized path solution , if one exists, is returned as a solution to a given planning problem (Lines 12-13).
Iv-D Data Collection
The data collection consists of creating a random set of workspaces, sampling collision-free start and goal configurations in those workspaces, and generating paths using a classical motion planner for every start and goal pair. The following sections describe the procedure to create workspaces, start and goal pairs, and near-optimal paths.
Many different 2D and 3D workspaces were generated by randomly placing various quadrilateral blocks without repetition in the operating region of and , respectively. Each random placement of the obstacle blocks led to a different workspace.
Iv-D2 Start and goal configuration
For each generated workspace, a number of start and goal configurations were sampled randomly from its obstacle-free space.
Iv-D3 Near-optimal paths
Finally, for each generated start and goal pair within all workspaces, a feasible, near-optimal path was generated using the RRT* motion planner.
Complete dataset comprised 110 different workspaces for the presented scenarios in the results section i.e., simple 2D (s2D), complex 2D (c2D), complex 3D (c3D), and rigid-body (rigid). The training dataset contained 100 workspaces with 4000 training paths in every workspace. There were two types of test datasets. The first test dataset comprised already seen 100 workspaces with 200 unseen start and goal configurations in each of the workspaces. The second test dataset comprised entirely unseen 10 workspaces where each contained 2000 unseen start and goal configurations. For rigid-body case, the range of angular configuration was scaled to the range of positional configurations, i.e., to , for training and testing. In case of 6 DOF robot, we consider only a single environment thus no environment encoding is included, and only start and goal configurations are sampled to collect example trajectories (50,000) from collision-free space to train our feedforward neural network (DeepSampler). The test scenario for 6 DOF robot is to generate paths for unseen start and goal pairs.
DeepSMP neural models were trained in mini-batches using Adagrad  optimizer with a learning rate of . CAE was trained on raw point cloud data from different workspaces which were generated randomly as described earlier. The regularization coefficient was set to . For DeepSampler, Dropout probability was kept constant to 0.5 for both training and testing. The number is set to the number of nodes in the longest path available in the training data. For RRT*, gamma of ball-radius was set to 1.6 whereas tree extension step sizes for point-mass and rigid-body were kept at 0.01 and 0.9, respectively. Finally for the 6-DOF robot, we use OMPL’s RRT* and ROS with their default parameter settings for path generation.
This section presents the results of DeepSMP for the motion planning of a point-mass robot, rigid-body, and Universal 6 DOF robot (UR6) in both 2D and 3D environments. All experiments were carried out on a computer with 3.40GHz
8 Intel Core i7 processor with a 16 GB RAM and GeForce GTX 1080 GPU. DeepSMP, implemented in PyTorch, was compared against Informed-RRT* and BIT* implemented in Python. In the following results, the datasets- and - comprised 100 workspaces seen by DeepSMP during training and 10 workspaces not seen by DeepSMP during training, respectively. Both test datasets - and - contained 200 and 2000 unseen start and goal configurations, respectively, for every workspace. Note that every combination of either seen or unseen environment with unseen start and goal pair constitutes a new planning problem, i.e., not seen by DeepSMP during training. For each planning problem, we ran 20 trials of all presented SMPs to calculate the mean computational time.
Figs. 2-5 show different example scenarios named as simple 2D (s2D), complex 2D (c2D), complex 3D (c3D) and rigid-body (rigid) where DeepSMP with underling RRT* method is planning motions. The mean computational time (in seconds) and iterations took by DeepSMP for each scenario are denoted as and , respectively.
Table I presents the mean computational time comparison of DeepSMP with an underlying RRT* SMP against Informed-RRT* and BIT* for computing near-optimal paths in different environments s2D, c2D, c3D and rigid. Note that, unbiased RRT* method is not included in the comparison as the computation time of RRT*, for computing near-optimal paths, is much higher than all presented algorithms. We report the mean (), maximum (), and minimum () time taken by an algorithm in every environment. It can be seen that in all test cases, the mean computation time of DeepSMP:RRT* remained consistently around 2 seconds. However, the mean computation time of Informed-RRT* and BIT* increases significantly as the dimensionality of the planning problem increases slightly. Furthermore, the rightmost column presents the ratio of mean computational time of BIT* to DeepSMP, and it is observed that on average, our method is at least 7 times faster than BIT*, the current state-of-art motion planner.
From experiments presented so far, it is evident that BIT* outperforms Informed-RRT*, therefore, in the following experiments only DeepSMP and BIT* are compared. Fig. 6 compares the mean computation time of DeepSMP: RRT* and BIT* in two test cases, i.e., - and -. It can be observed that the mean computation time of DeepSMP stays around 2 seconds irrespective of the given problem’s dimensionality. Furthermore, the mean computational time of BIT* not only fluctuates but also increases significantly as the dimensionality of the planning problem increases slightly.
Finally, Fig. 7 shows DeepSMP planning motions for a Universal 6-DOF robot. In Fig. 7 (a), the robotic manipulator is at the start configuration whereas its target configuration is symbolized as a shadowed region. Fig. 7 (b) shows the traces of a path planned by DeepSMP for the given start and goal pair. In this problem, the mean computational times taken by DeepSMP and BIT* are 1.7 and 48.8 seconds, respectively, which makes DeepSMP around 28 times faster than BIT*.
|Simple 2D (s2D)||Seen||0.90||1.09||0.78||9.61||11.90||3.21||4.62||10.68||1.79||5.13|
|Complex 2D (c2D)||Seen||1.62||2.19||1.09||10.81||14.30||7.11||6.56||12.02||3.22||4.05|
|Complex 3D (c3D)||Seen||1.16||1.72||0.72||18.15||74.50||16.69||16.92||49.03||6.19||14.68|
Vi-a Stochasticity through Dropout
Our stochastic feedforward DeepSampler uses Dropout  in every layer except the last two layers during both offline and online execution. Dropout is applied layer-wise to a neural network, and it drops each unit in the hidden layer with a probability . In our models, the dropped out units are indicated as dotted circles in Fig. 1. Thus, the resulting neural network is a sliced version of the original deep model, and in every iteration during online execution, a different model emerges through randomly dropping some hidden units. These perturbations in DeepSampler through Dropout enables DeepSMP to generate different samples in the region likely to contain path solutions.
Vi-B Bidirectional Sampling
Since our method incrementally generates samples, it can be easily extended to produce samples for bidirectional SMPs such as IB-RRT* 
. To do so, treat both start and goal configuration as random variablesand , respectively, and swap their roles by the end of every iteration in Algorithm 1. This way, two trees in bidirectional SMPs can be made to march towards each other to rapidly compute end-to-end collision-free paths.
SMPs ensure probabilistic completeness. Let denotes the tree vertices of SMP after iterations. Since all SMPs begin to build a tree from initial robot state i.e., , and randomly explore the entire configuration space by forming a connected tree as approaches to infinity, they guarantee probabilistic completeness i.e.,
DeepSMP also starts generating a connected tree from and after exploring a region that most likely contains a path solution for iteration, it switches to uniform random sampling (see Algorithm 1). Therefore, if , DeepSMP also ensures probabilistic completeness i.e., as the number of iterations approach to infinity, the probability of DeepSMP finding a path solution, if one exists, approaches to one:
Vi-D Asymptotic Optimality
RRT* and its variants are known to ensure asymptotic optimality i.e., as the number of iterations approaches to infinity/large-number, the probability of finding a minimum cost path solution reaches to one. This property comes from incrementally rewiring the RRT graph connections such that the shortest path is asymptotically guaranteed in RRT*. It is proposed that if the underlying SMP of DeepSMP is RRT* or any optimal variant of RRTs, DeepSMP is guaranteed to be asymptotic optimal. This follows from the fact that DeepSMP samples a selective region for fixed number of iterations and switches to uniform randoms sampling afterwards. Thus if the number of iterations goes infinity, through incremental rewiring of DeepSMP graph, the asymptotic optimality is also guaranteed.
Vi-E Computational Complexity
A forward pass through a deep neural network is known to exhibit complexity. It can be seen in Algorithm 1 that adaptive samples are generated incrementally by forward passing through our stochastic DeepSampler. Hence, the proposed neural sampling method does not add any extra computational overhead to any underlying SMP for path generation. Thus, the computational complexity of DeepSMP method will essentially be the same as underlying SMP in Algorithm 1. For instance, as in our case, RRT* is an underlying SMP method, therefore, in presented experiments, the computational complexity of DeepSMP is , where is the number of nodes in the tree.
Vii Conclusions and Future work
In this paper, we present a deep neural network based sampling method called DeepSMP which generates samples for Sampling-based Motion Planning algorithms to compute optimal paths rapidly and efficiently. The proposed method 1) adaptively samples a selective region of a configuration space that most likely contains an optimal path solution, 2) combined with SMP methods consistently demonstrate mean execution time of about 2 second in all presented experiments, and 3) generalizes to new unseen environments.
In our future work, we plan to propose an incremental online learning method that begins with an SMP method, and trains DeepSMP simultaneously to gradually switch from uniform sampling to adaptive sampling. To speed up the incremental online learning process, we plan to propose a method that prioritizes experiences to learn from selectively fewer training examples.
-  S. M. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” 1998.
-  S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” The international journal of robotics research, vol. 30, no. 7, pp. 846–894, 2011.
-  A. H. Qureshi and Y. Ayaz, “Potential functions based sampling heuristic for optimal path planning,” Autonomous Robots, vol. 40, no. 6, pp. 1079–1093, 2016.
-  A. H. Qureshi, M. J. Bency, and M. C. Yip, “Motion planning networks,” arXiv preprint arXiv:1806.05767, 2018.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: Explicit invariance during feature extraction,” in
Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 2011, pp. 833–840.
-  N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
-  J. D. Gammell, S. S. Srinivasa, and T. D. Barfoot, “Informed rrt*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic,” in Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on. IEEE, 2014, pp. 2997–3004.
-  J. D. Gammell, S. Srinivasa, and T. D. Barfoot, “Batch informed trees (bit*): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 3067–3074.
-  M. Rickert, O. Brock, and A. Knoll, “Balancing exploration and exploitation in motion planning,” in Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on. IEEE, 2008, pp. 2812–2817.
-  C. Urmson and R. Simmons, “Approaches for heuristically biasing rrt growth,” in Intelligent Robots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, vol. 2. IEEE, 2003, pp. 1178–1183.
-  D. Ferguson and A. Stentz, “Anytime rrts,” in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. IEEE, 2006, pp. 5369–5375.
-  K. Hauser, “Lazy collision checking in asymptotically-optimal motion planning,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 2951–2957.
-  D. Berenson, P. Abbeel, and K. Goldberg, “A robot path planning framework that learns from experience,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp. 3671–3678.
-  D. Coleman, I. A. Şucan, M. Moll, K. Okada, and N. Correll, “Experience-based planning with sparse roadmap spanners,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 900–905.
-  M. Zucker, J. Kuffner, and J. A. Bagnell, “Adaptive workspace biasing for sampling-based planners,” in Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on. IEEE, 2008, pp. 3757–3762.
-  L. Trottier, P. Giguère, and B. Chaib-draa, “Parametric exponential linear unit for deep convolutional neural networks,” arXiv preprint arXiv:1605.09332, 2016.
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011.
-  A. H. Qureshi and Y. Ayaz, “Intelligent bidirectional rapidly-exploring random trees for optimal motion planning in complex cluttered environments,” Robotics and Autonomous Systems, vol. 68, pp. 1–11, 2015.