I Introduction
Robotic motion planning aims to compute a collisionfree path for the given start and goal configurations [1]. As motion planning algorithms are necessary for solving a variety of complicated, highdimensional problems ranging from autonomous driving [2] to space exploration [3], there arises a critical, unmet need for computationally tractable, realtime algorithms. The quest for developing computationally efficient motion planning methods has led to the development of various samplingbased motion planning (SMP) algorithms such as Rapidlyexploring Random Trees (RRT) [4], optimal Rapidlyexploring Random Trees (RRT*) [5], Potentially guidedRRT* (PRRT*) [6] and their bidirectional variants [7, 8]. Despite previous efforts to design fast, efficient planning algorithms, the current stateoftheart struggles to offer methods which scale to the highdimensional setting that is common in many realworld applications.
To address the abovementioned challenges, we propose a Deep Neural Network (DNN) based iterative motion planning algorithm, called MPNet (Motion Planning Networks) that efficiently scales to highdimensional problems. MPNet consists of two components: an encoder network and a planning network. The encoder network learns to encode a point cloud of the obstacles into a latent space. The planning network learns to predict the robot configuration at time step given the robot configuration at time , goal configuration, and the latentspace encoding of the obstacle space. Once trained, MPNet can be used in conjunction with our novel bidirectional iterative algorithm to generate feasible trajectories.
We evaluate MPNet on a large test dataset including multiple planning problems such as the planning of a pointmass robot, rigidbody, and 7 DOF Baxter robot manipulator in various 2D and 3D environments. As neural networks do not provide theoretical guarantees on their performance, we also propose a hybrid algorithm which combines MPNet with any existing classical planning algorithm, in our case RRT*. The hybrid planning technique demonstrates a 100% success rate consistently over all tested environments while retaining the computational gains. Our results indicate that MPNet generalizes very well, not only to unseen start and goal configurations within workspaces which were used in training, but also to new workspaces which the algorithm has never seen.
Ii Related Work
Research into developing neural networkbased motion planners first gained traction in the early 1990s but faded away due to computational complexity of training deep neural networks [10]
. However, recent developments in Deep Learning (DL) have allowed researchers to apply various DL architectures to robotic control and planning.
An active area of research within robotic control and planning is Deep Reinforcement Learning (DRL). For instance,
[11] shows how to train a robot to learn visuomotor policies to perform various tasks such as screwing a bottle cap, or inserting a peg. Although DRL is a promising framework, it extensively relies on exploration through interaction with the environment, thus making it difficult to train for many realworld robotic applications. A recent work, Value Iteration Networks (VIN) [12]emulates value iteration by leveraging recurrent convolutional neural networks and maxpooling. However, in addition to limitations inherited from underlying DRL framework, VIN has only been evaluated on simple toy problems, and it is not clear how VIN could extend beyond such environments.
Imitation learning is another emerging field in which the models are trained from expert demonstrations. Many interesting problems have been addressed through imitation learning [13, 14, 15]. A recent method [16] uses deep neural networks trained via imitation to adaptively sample the configuration space for SMP methods. Our proposed method also learns through imitation but unlike [16], it provides a complete feasible motion plan for a robot to follow.
Another recent and relevant method is the Lightning Framework [17], which is composed of two modules. The first module performs path planning using any traditional motion planner. The second module maintains a lookup table which caches old paths generated by the first module. For new planning problems, the Lightning Framework retrieves the closest path from a lookup table and repairs it using a traditional motion planner. This approach demonstrates superior performance compared to conventional planning methods. However, not only are lookup tables memory inefficient, they also are incapable of generalizing to new environments.
Iii Problem Definition
This section describes the notations used in this paper and formally defines the motion planning problem addressed by the proposed method.
Let be an ordered list of length , then a sequence is a mapping from to the th element of . Moreover, for the algorithms described in this paper, and give the last element and the number of elements in a set , respectively. Let be a given state space, where is the dimensionality of the state space. The workspace dimensionality is indicated by . The obstacle and obstaclefree state spaces are defined as and , respectively. Let the initial state be , and goal region be . Let an ordered list be a path having positive scalar length. A solution path to the motion planning problem is feasible if it connects and , i.e. and , and lies entirely in the obstaclefree space . The proposed work addresses the feasibility problem of motion planning.
Iv MPNet: A Neural Motion Planner
This section introduces our proposed model, MPNet^{1}^{1}1Supplementary material including implementation parameters and project videos are available at https://sites.google.com/view/mpnet/home. (see Fig. 2). MPNet is a neural network based motion planner comprised of two phases: (A) offline training of the neural models, and (B) online path generation.
Iva Offline Training
Our proposed method uses two neural models to solve the motion planning problem. The first model is an encoder network which embeds the obstacles point cloud, corresponding to a point cloud representing , into a latent space (see Fig. 2(a)). The second model is a planning network (Pnet) which learns to do motion planning for the given obstacle embedding, and start and goal configurations of the robot (see Fig. 2(b)).
IvA1 Encoder Network
The encoder network (Enet) embeds the obstacles point cloud into a feature space with dimensionality . Enet can be trained either using encoderdecoder architecture with a reconstruction loss or in an endtoend fashion with the Pnet (described below). For encoderdecoder training, we found that the contrative autoencoders (CAE) [18] learns robust and invariant feature space required for planning and genalization to unseen workspaces. The reconstruction loss of CAE is defined as:
(1) 
where are the parameters of encoder, are the parameters of decoder, is a penalizing coefficient, is a dataset of point clouds from different workspaces, and is the point cloud reconstructed by the decoder.
IvA2 Planning Network
We use a feedforward deep neural network, parameterized by , to perform planning. Given the obstacles encoding , current state and the goal state , Pnet predicts the next state which would lead a robot closer to the goal region, i.e.,
To train Pnet, any planner or human expert can provide feasible, nearoptimal paths as expert demonstrations. We assume the paths given by the expert demonstrator are in a form of a tuple, , of feasible states that connect the start and goal configurations so that the connected path lies entirely in . The training objective for the Pnet is to minimize the meansquarederror (MSE) loss between the predicted states and the actual states given by the expert data, formalized as:
(2) 
where is the averaging term corresponding to the total number of paths, , in the training dataset times the path lengths.
IvB Online Path Planning
The online phase exploits the neural models from the offline phase to do motion planning in cluttered and complex environments. The overall flow of information between Enet and Pnet is shown in Fig. 2
(c). To generate endtoend feasible paths connecting the start and goal states, we propose a novel incremental bidirectional path generation heuristic. Algorithm 1 presents the overall path generation procedure and its constituent functions are described below.
IvB1 Enet
The encoder network , trained during the offline phase, is used to encode the obstacles point cloud into a latent space .
IvB2 Pnet
Pnet is a feedforward neural network from the offline phase which takes
, current state , goal state and predicts the next state of the robot. To inculcate stochasticity into the Pnet, some of the hidden units in each of its hidden layer were dropped out with a probability
. The merit of adding the stochasticity during the online path generation are presented in the discussion section.IvB3 Lazy States Contraction (LSC)
Given a path , the LSC algorithm connects the directly connectable nonconsecutive states, i.e., and , and removes the intermediate/lazy states.
IvB4 Steering
The function takes two states as an input and checks either a straight trajectory connecting the given two states lies entirely in collisionfree space or not. The steering is done from to in small, discrete steps and can be summarized as . The discrete step size could be a fixed number or can be adapted for different parts of the algorithm.
IvB5 isFeasible
Given a path , this procedure checks either the endtoend path, formed by connecting the consecutive states in , lies entirely in or not.
IvB6 Neural Planner
This is an incremental bidirectional path generation heuristic (see Algorithm 2 for the outline). It takes the obstacles’ representation, , as well as the start and goal states as an input, and outputs a path connecting the two given states. The sets and correspond to the paths generated from the start and goal states, respectively. The algorithm starts with , it generates a new state , using Pnet, from start towards the goal (Line 5), and checks if a path from start is connectable to the path from a goal (Line 7). If paths are connectable, an endtoend path is returned by concatenating and . However, if paths are not connectable, the roles of and are swapped (Line 11) and the whole procedure is repeated again. The swap function enables the bidirectional generation of paths, i.e., if at any iteration , path is extended then in the next iteration , path will be extended. This way, two trajectories and march towards each other which makes this path generation heuristic greedy and fast.
IvB7 Replanning
This procedure is outlined in the Algorithm 3. It iterates over all the consecutive states and in a given path , and checks if they are connectable or not, where . If any consecutive states are found not connectable, a new path is generated between those states using one of the following replanning methods (Line 5).
a) Neural Replanning: Given a start and goal states together with obstacle space encoding , this method recursively finds a new path between the two given states. To do so, it starts by finding a coarse path between the given states and then if required, it replans on a finer level by calling itself over the nonconnectable consecutive states of the new path. This recursive neural replanning is performed for the fixed number of steps to limit the algorithm within the computational bounds.
b) Hybrid Replanning: This heuristic combines the neural replanning with the classical motion planning methods. It performs the neural replanning for the fixed number of steps. The resulting new path is tested for feasibility. If a path is not feasible, the nonconnectable states in the new path are then connected using a classical motion planner.
V Implementation details
This section gives the implementation details of MPNet, for additional details refer to supplementary material. The proposed neural models were implemented in PyTorch
[19]. For environments other than Baxter, the benchmark methods, InformedRRT* and BIT*, were implemented in Python, and their times were compared against the CPUtime of MPNet. The Baxter environments were implemented with MoveIt! [20] and ROS. In these environments, we use a C++ OMPL [21] implementation of BIT* to compare against a C++ implementation of MPNet. The system used for training and testing has 3.40GHz 8 Intel Core i7 processor with 32 GB RAM and GeForce GTX 1080 GPU. The remaining section explains different modules that lead to MPNet.Va Data Collection
We generate 110 different workspaces for each presented case, i.e., simple 2D (s2D), rigidbody (rigid), complex 2D (c2D) and 3D (c3D). In each of the workspaces, 5000 collisionfree, nearoptimal, paths were generated using RRT*. The training dataset comprised of 100 workspaces with 4000 paths in each workspace. For testing, two types of test datasets were created to evaluate the proposed and benchmark methods. The first test dataset, seen, comprised of already seen 100 workspaces with 200 unseen start and goal configurations in each workspace. The second test dataset, unseen, comprised of completely unseen 10 workspaces where each contained 2000 unseen start and goal configurations. In the Baxter experiments, we created a dataset comprised of ten challenging simulated environments, and we show the execution on the real robot. For each environment, we collected 900 paths for training and 100 paths for testing. The obstacle point clouds were obtained using a Kinect depth camera with the PCL [22] and pcl_ros^{2}^{2}2http://wiki.ros.org/pcl_ros package.
VB Models Architecture
VB1 Encoder Network
For all environments except Baxter, we use encoderdecoder training, whereas for Baxter, we train the encoder and planning network endtoend. Since the decoder is usually the inverse of the encoder, we only describe the encoder’s structure. The encoding function
comprised of three linear layers and an output layer, where each linear layer is followed by the Parametric Rectified Linear Unit (PReLU)
[23]. The input to the encoder is a vector of point clouds of size
where is the number of data points, and is the dimension of a workspace.VB2 Planning Network
PNet is 9layers and 12layers DNN for Baxter and other environments, respectively. We use Parametric Rectified Linear Unit (PReLU) [23] for nonlinearity. To add stochasticity, we Dropout (p) [24] in all hidden layers except the last one. In pointmass and rigidbody cases, we fix the pretrained encoder parameters, since they were trained using the encoderdecoder method, and use them to compute environment encoding, whereas, for Baxter environments, we train the Enet and Pnet endtoend.
Environment  Test case  MPNet (NR)  MPNet (HR)  InformedRRT*  BIT*  

Simple 2D  Seen  24.64  
Unseen  23.91  
Complex 2D  Seen  22.17  
Unseen  22.89  
Complex 3D  Seen  17.85  
Unseen  20.14  
Rigid  Seen  34.69  
Unseen  36.09 
Vi Results
In this section, we compare the performance of MPNet with NeuralReplanning (MPNet: NR) and HybridReplanning (MPNet: HR) against stateoftheart motion planning methods, i.e., InformedRRT* and BIT*, for the motion planning of the 2D/3D pointmass robots, rigidbody, and Baxter 7 DOF manipulator in the 2D and 3D environments.
Figs. 3 show different example scenarios where MPNet and expert planner, in this case RRT*, provided successful paths. The red and blue colored trajectories indicate the paths generated by MPNet and RRT*, respectively. The goal region is indicated as a brown colored disk. The mean computational time for the MPNet and RRT* is denoted as and , respectively. We see that MPNet is able to compute nearoptimal paths for both pointmass and rigidbody robot in considerably less time than RRT*.
Table I presents the CPUtime comparison of MPNet: NP and MPNet: HR against InformedRRT* [25] and BIT* [9] over the two test datasets, i.e., seen and unseen
. We report the mean times with standard deviation of all algorithms for finding the initial paths in a given problem. For initial paths, it is observed that on average the path lengths of benchmark methods were higher than the path lengths of MPNet. It can be seen that in all test cases, the mean computation time of MPNet with neural and hybrid replanning remained around 1 second. The mean computation time of InformedRRT* and BIT* increases significantly as the dimensionality of planning problem is increased. Note that, on average, MPNet is about 40 and 20 times faster than InformedRRT* and BIT*, respectively, in all test cases and consistently demonstrates low computational time irrespective of the dimensionality of the planning problem. In these experiments, the mean accuracy of MPNet: HR and MPNet: NP was 100% and 97% with the standard deviation of about 0.4% over five different trials.
From experiments presented so far, it is evident that BIT* outperforms InformedRRT*, therefore, in the following experiments only MPNet and BIT* are compared. Fig. 4 compares the mean computation time of MPNet: NP and BIT* in our two test datasets. It can be seen that the mean computation time of MPNet stays around 1 second irrespective of the planning problem dimensionality. Furthermore, the mean computational time of BIT* not only fluctuates but also increases significantly in the rigidbody planning problem.
Finally, Fig. 1 shows a single 7 DOF arm of the Baxter robot executing a motion planned by MPNet for a given start and goal configuration. In Fig. 1, the robotic manipulator is at the start configuration, and the shadowed region indicates the manipulator at the target configuration. On the Baxter’s test dataset, MPNet took about 1 second on average with 85% success rate. BIT* took about 9 seconds on average with a success rate of 56% to find paths within a 40% range of the path lengths found by MPNet, and was found to take up to several minutes to find paths within the 10% range of average MPNet path lengths.
Vii Discussion
Viia Stochasticity through Dropout
Our method uses Dropout [24] during both online and offline execution. Dropout is applied layerwise to the neural network and it drops each unit in the hidden layer with a probability , in our case . The resulting neural network is a thinned network and is essentially different from the actual neural model [24].
Note that, in the neural replanning phase, MPNet iterates over the nonconnectable consecutive states of the coarse path to do motion planning on a finer level and thus, produces a new path. The replanning procedure is called recursively on each of its own newly generated paths until a feasible solution is found or a loop limit is reached. Dropout adds stochasticity to the Pnet which implies that on each replanning step, the Pnet would generate different paths from previous replanning steps. This phenomenon is evident from Fig. 5 where the Pnet generated different paths for a fixed start and goal configurations. These perturbations in generated paths for fixed start and goal help in recovery from the failure. Thus, adding Dropout increases the overall performance of MPNet. Furthermore, due to stochasticity by Dropout, our method can also be used to generate adaptive samples for sampling based motion planners [26].
ViiB Completeness
In the proposed method, a coarse path is computed by a neural network. If a coarse path is found to be not fully connectable, a replanning heuristic is executed to repair the nonconnectable path segments to provide an endtoend collisionfree path. The completeness guarantees for the proposed method depends on the underline replanning heuristic. The classical motion planner based replanning methods are presented to guarantee the completeness of the proposed method. Since we use RRT*, our proposed method inherits the probabilistic completeness of RRTs and RRT* [5] while retaining the computational gains.
ViiC Computational Complexity
This section formally highlights the computational complexity of the proposed method. Neural networks are known to have online execution complexity of . Therefore, the execution of lines 12 of Algorithm 1 will have a complexity no greater than . The lazy state contraction (LSC) heuristic is a simple path smoothing technique which can be executed in a fixed number of iteration as a feasible trajectory must have a finite length. Also, note that the LSC is not an essential component of the proposed method. Its inclusion helps to generate nearoptimal paths. The computational complexity of the replanning heuristic depends on the motion planner used for replanning. We proposed the neural replanning and hybrid replanning methods. Since the neural replanner is executed for fixed number of steps, the complexity is . For the classical motion planner, we use RRT* which has complexity, where is the number of samples in the tree [5]. Hence, for hybrid replanning, we can conclude that the proposed method has a worst case complexity of and a best case complexity of . Note that, MPNet: NR is able to compute collisionfree paths for more than 97% of the cases, presented in Table I. Therefore, it can be said that MPNet will be operating with most of the time except for nearly 3% cases where the RRT* needs to be executed for a small segment of overall path given by MPNet:NR. This execution of RRT* on small segments of a global path reduces the complicated problem to a simple planning problem which makes the RRT* execution computationally acceptable and practically much less than .
Viii Conclusions and Future work
In this paper, we present a fast and efficient Neural Motion Planner called MPNet. MPNet consists of an encoder network that encodes the point cloud of a robot’s surroundings into a latent space,, and a planning network that takes the environment encoding, and start and goal robotic configurations to output a collisionfree feasible path connecting the given configurations. The proposed method (1) plans motions irrespective of the obstacles geometry, (2) demonstrates mean execution time of about 1 second in all presented experiments, (3) generalizes to new unseen obstacle locations, and (4) has completeness guarantees.
In our future work, we plan to extend MPNet to build learningbased actorcritic motion planning methods by combining it with proxy collision checkers such as the Fastron algorithm [27, 28]. Another interesting extension would be to address the challenge of kinodynamic motion planning in dynamically changing environments.
References
 [1] S. M. LaValle, Planning algorithms. Cambridge university press, 2006.
 [2] T. LozanoPerez, Autonomous robot vehicles. Springer Science & Business Media, 2012.
 [3] R. Volpe, “Rover functional autonomy development for the mars mobile science laboratory,” in Proceedings of the 2003 IEEE Aerospace Conference, vol. 2, 2003, pp. 643–652.
 [4] S. M. LaValle, “Rapidlyexploring random trees: A new tool for path planning,” 1998.
 [5] S. Karaman and E. Frazzoli, “Samplingbased algorithms for optimal motion planning,” The international journal of robotics research, vol. 30, no. 7, pp. 846–894, 2011.
 [6] A. H. Qureshi and Y. Ayaz, “Potential functions based sampling heuristic for optimal path planning,” Autonomous Robots, vol. 40, no. 6, pp. 1079–1093, 2016.
 [7] ——, “Intelligent bidirectional rapidlyexploring random trees for optimal motion planning in complex cluttered environments,” Robotics and Autonomous Systems, vol. 68, pp. 1–11, 2015.
 [8] Z. Tahir, A. H. Qureshi, Y. Ayaz, and R. Nawaz, “Potentially guided bidirectionalized rrt* for fast optimal path planning in cluttered environments,” Robotics and Autonomous Systems, vol. 108, pp. 13–27, 2018.
 [9] J. D. Gammell, S. S. Srinivasa, and T. D. Barfoot, “Batch informed trees (bit*): Samplingbased optimal planning via the heuristically guided search of implicit random geometric graphs,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 3067–3074.
 [10] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015.

[11]
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “Endtoend training of deep
visuomotor policies,”
Journal of Machine Learning Research
, vol. 17, no. 39, pp. 1–40, 2016.  [12] A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel, “Value iteration networks,” in Advances in Neural Information Processing Systems, 2016, pp. 2154–2162.
 [13] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to end learning for selfdriving cars,” arXiv preprint arXiv:1604.07316, 2016.
 [14] S. Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and A. G. Billard, “Learning and reproduction of gestures by imitation,” IEEE Robotics & Automation Magazine, vol. 17, no. 2, pp. 44–54, 2010.
 [15] R. Rahmatizadeh, P. Abolghasemi, A. Behal, and L. Bölöni, “Learning real manipulation tasks from virtual demonstrations using lstm,” arXiv preprint, 2016.
 [16] B. Ichter, J. Harrison, and M. Pavone, “Learning sampling distributions for robot motion planning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 7087–7094.
 [17] D. Berenson, P. Abbeel, and K. Goldberg, “A robot path planning framework that learns from experience,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp. 3671–3678.

[18]
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive autoencoders: Explicit invariance during feature extraction,” in
Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 2011, pp. 833–840.  [19] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
 [20] I. A. Şucan and S. Chitta, “Moveit!”, [online] available: http://moveit.ros.org.”
 [21] I. A. Şucan, M. Moll, and L. E. Kavraki, “The Open Motion Planning Library,” IEEE Robotics & Automation Magazine, vol. 19, no. 4, pp. 72–82, December 2012, http://ompl.kavrakilab.org.
 [22] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),” in IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 913 2011.
 [23] L. Trottier, P. Giguère, and B. Chaibdraa, “Parametric exponential linear unit for deep convolutional neural networks,” arXiv preprint arXiv:1605.09332, 2016.
 [24] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [25] J. D. Gammell, S. S. Srinivasa, and T. D. Barfoot, “Informed rrt*: Optimal samplingbased path planning focused via direct sampling of an admissible ellipsoidal heuristic,” in Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on. IEEE, 2014, pp. 2997–3004.
 [26] A. H. Qureshi and M. C. Yip, “Deeply informed neural sampling for robot motion planning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 6582–6588.
 [27] N. Das and M. Yip, “Learningbased proxy collision detection for robot motion planning applications,” arXiv preprint arXiv:1902.08164, 2019.

[28]
N. Das, N. Gupta, and M. Yip, “Fastron: An online learningbased model and active learning strategy for proxy collision detection,” in
Conference on Robot Learning, 2017, pp. 496–504.  [29] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. GrabskaBarwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.

[30]
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”
Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011.
Comments
There are no comments yet.