## I Introduction

The motion planning problem is to find a path connecting the given start and goal states while satisfying all the desired constraints on the robot motion. Geometric motion planning is a simple instance that considers only collision-avoidance constraints with states representing the robot’s kinematics variables (e.g., position, joint-angles) [lavalle2006planning]. However, in KMP, the state comprises position, velocity, and sometimes acceleration, and the robot motions are required to satisfy both kinematic (e.g., collision-avoidance) and dynamics (e.g., velocity and acceleration) constraints, which makes the problem PSPACE-hard and computationally demanding [lavalle2006planning].

The KMP has a broad range of applications from torque-constrained robot manipulation to speed racing [arab2016motion] and performing acrobatic motions [kaufmann2020deep]. The existing state-of-the-art methods solve KMP through Sampling-based Motion Planners (SMPs) [lavalle2006planning] by constructing a tree that originates from a start state and expands by searching the entire state-space to reach the given goal state. The edges of the tree, connecting any two intermediate states, are formed by a local steering function. The local steering often requires solving a boundary-value problem (BVP) through trajectory optimization, which is known to be NP-hard [kacewicz2002complexity] and fails when the boundary constraints are not satisfied. A recent development led to another sampling-based KMP approach [li2016asymptotically], called Stable Sparse-RRT (SST), that circumvents solving BVPs by using a random shooting method for steering and achieves asymptotic optimality. However, it still takes long computation times from tens of seconds to minutes as environments become more complicated due to uniform exploration of state and control spaces.

In this paper, we propose an imitation learning-based approach named Model-Predictive Motion Planning Networks (MPC-MPNet)^{1}^{1}1Supplementary videos and code release details are available at https://sites.google.com/view/mpc-mpnet

. It has a deep neural networks-based generator and discriminator, which, once trained using expert data, outputs feasible paths that satisfy the given kinodynamic constraints. Our approach extends the Motion Planning Networks (MPNet)

[qureshi2019motion] and leverages collision-aware Model Predictive Control (MPC) methods in the planning loop for parallelizable local steering. Note that the original MPNet framework [qureshi2019motion] considered only geometric planning problems and relied on bidirectional path expansions and re-planning steps for finding path solutions to given problems. In KMP, the bidirectional path extensions and re-planning steps to repair in-collision path segments often lead to discontinuity and infeasible paths. Therefore, in MPC-MPNet, we propose novel planning algorithms that perform only forward path expansion and avoid re-planning by building a set of informed possible paths. We evaluate our approach in challenging under-actuated robotics problems in complex, cluttered environments. Our results show that MPC-MPNet outperforms state-of-the-art planning algorithms in computational speed and generalizes to unseen planning problems with high success rates.## Ii Related Work

This section presents a brief overview of related work done in the past for solving KMP problems. KMP algorithms depend on the local steering function to satisfy the motion constraints between any two given states. The local steering methods are usually implemented either as (i) random shooting methods which uniformly samples control sequences [donald1993kinodynamic][lavalle2001randomized][li2016asymptotically], (ii) predefined motion primitives [sakcak2019sampling], or (iii) local trajectory optimization solvers [xie2015toward][perez2012lqr][goretkin2013optimal][tedrake2010lqr].

Sampling random control sequences for shooting method has the advantage of worst-case theoretical guarantees. For instance, the SST approach [li2016asymptotically] leverages random exploration of control and state spaces to provide the worst-case asymptotic completeness and optimality guarantees. However, these methods struggle in higher-dimensional and cluttered planning spaces since uniform exploration takes large computation times to determine a feasible path solution. In contrast, the motion primitives methods generate a local database of optimal controllers in their offline stage [sakcak2019sampling] and use them for steering under KMP algorithms. However, as the motion primitive set is finite, it usually does not capture the entire control space and lacks completeness guarantees.

The trajectory optimization methods formulate local steering as an optimization problem and iteratively solve them for computing action sequences connecting the given states. For instance, [webb2013kinodynamic][goretkin2013optimal] locally linearize the system and obtain Linear Quadratic Regulators’ (LQR) parameters using optimization for steering. Tedrake et. el [tedrake2010lqr] extend LQR methods to construct a tree connecting states within stability regions defined by LQR funnels. However, these methods add a substantial computational burden when constructing trees and do not apply to online planning problems. In a similar vein, Xie et. el [xie2015toward] formulated local steering as BVP and used optimization to find their solutions. These BVP solvers operate in conjunction with traditional SMPs [lavalle2006planning] for finding solutions to KMP problems. However, their approach often collapse when the boundary constraints are not satisfied.

Another class of methods in KMP learn local controllers for steering through reinforcement and imitation learning. For instance, [chiang2019rl]

uses reinforcement learning (RL) to acquire local policy for steering within SMP methods. Similarly,

[ota2020efficient] [ota2020deep] constructs high-level landmarks, which are later connected by local RL policies. However, these methods inherit RL limitations such as requiring exhaustive interactions with their environments for learning. In the imitation learning paradigm, [wolfslag2018rrt] and [allen2019real] use expert demonstrations to learn local policies for connecting given states, but they still rely on classical planners to generate those state sequences.Recent developments overcome the limitations of classical motion planning by incorporating learned planning distributions into their algorithms [qureshi2019motion][qureshi2019motionb][ichter2018learning][ichter2019robot]. These learning-based planners quickly generate feasible motion sequences online for finding path solutions but mostly consider geometric planning problems. Among them, MPNet [qureshi2019motion] [qureshi2019motionb] generates end-to-end paths by exploiting learned distributions in a bidirectional manner throughout its planning and replanning process. MPNet has also been extended to consider kinematic [qureshi2020compnet] and non-holonomic [johnson2020dynamically] constraints. However, these extensions still rely on bidirectional planning, which often leads to discontinuities and infeasible paths in KMP problems.

## Iii Problem Definition

Let denote a configuration space (C-space) of a mechanical system, where collision and collision-free regions are represented as and , respectively. Let denote the state space in which a state, , contains a configuration and the time derivative . Like C-space, the state space also contains the collision and collision-free state spaces. The system’s dynamics model, represented by an implicit set of equations, can be formulated as , where denotes the control input to a system from a feasible control set . In general, the objective of KMP for the given initial state and goal region is to find a collision-free trajectory , comprising a sequence of states and controls with their corresponding durations , such that , .

## Iv Model Predictive Motion Planning Networks

In this section, we present MPC-MPNet, an end-to-end learning-based KMP algorithm. MPC-MPNet iteratively generates waypoints and local steering trajectories to construct collision-free paths connecting the start and goal states. It includes a neural generator, discriminator, and two novel planning algorithms named MPC-MPNetPath and MPC-MPNetTree. The main components of our approach are described as follows.

Methods/networks Methods/mpc

Methods/parallelazion

Methods/planners

## V Implementation Details

Methods | Planning Tasks | |||

Acrobot | Cart-Pole | Quadrotor | Car-like | |

MP-Path | ||||

MP-Tree | ||||

SST | ||||

The total mean computation times with standard deviations in seen and unseen (reported inside parenthesis) test environments are presented for MPC-MPNetPath (MP-Path), MPC-MPNetTree (MP-Tree), and SST in various kinodynamic planning problems.

This section describes the necessary implementation details of our frameworks with their training and testing environments. We implement our neural modules using Pytorch and export them to C++ with Torchscript. Our parallelized MPC algorithm follows standard GPU programming. For the training and testing environments, we consider the following kinodynamically constrained, cluttered environments with schematics shown in Fig.

3.#### V-1 Acrobot

We use the acrobot dynamics as specified in [spong1998underactuated]. The state space is defined as . The control space is defined as . We generate four rectangular obstacles and restrict the center of the obstacles to lie inside the annulus that affects the acrobot movement.

#### V-2 Cartpole

The cartpole dynamics are used as specified in [papadopoulos2014analysis]. The state space is defined as: . and the control space is defined as . We randomly place seven rectangular obstacles in the environment to challenge and restrict the cartpole motion.

#### V-3 Car

This is a 2D first-order car system, where the state space is of 3DOF, including position and orientation. The control inputs are the position velocity and angular velocity. The state space is defined as and control space bound as . We randomly place five rectangular obstacles in the workspace, and limit the distance between obstacles to ensure narrow passages.

#### V-4 Quadrotor

We define the quadrotor dynamics as in [ai2013integrated]. The state space is defined as: , where and denote the position and orientation of the quadrotor, respectively, with their corresponding time derivatives, indicating velocity, represented as and . The control space is 4 dimensional with bound . We randomly place 10 obstacles in the workspace space and ensure the scene is cluttered.

In each of the cases mentioned above, we set up 10 environments by random placement of the obstacles, each with 1000-2000 randomly sampled start and goal state pairs. The of data is used for testing, and the remaining for the training. In our test dataset, we also include two unseen environments for each problem, created by random placement of obstacles. We ensure these environments to be different from the ten seen settings, and for each, we randomly sample 100-200 valid start and goal pairs for evaluation. Hence, in total, our test dataset contains 12 environments for the given setups. The demonstration trajectories for the training data are obtained using the SST algorithm. Furthermore, we obtain the point-cloud of the environments, by randomly sampling the obstacle space and processing them into voxel of size .

## Vi Results

We present a set of experiments to compare the computation time, path quality, and success rate of MPC-MPNetPath, MPC-MPNetTree, and SST planning algorithms. All experiments were performed on the same system with 32GB RAM, GeForce GTX 1080 GPU, and 3.40GHz8 Intel Core i7 processor.

### Vi-a Comparative Studies

This study compares the computation time, success rate, and path quality (measured by time-to-reach) of MPC-MPNetPath, MPC-MPNetTree, and SST algorithms in Cart-Pole, Acrobot, Car, and Quadrotor environments. All evaluation planning problems were unique and not seen during the training, presenting non-trivial and cluttered environments with underactuated systems.

Table I presents the mean computation times with standard deviations across different scenarios in both seen and unseen environments. Figs. LABEL:fig:boxplot-acrobot-cartpole-car

show the box-plots of all methods comparing their computation time and path quality inter-quartile ranges for solving all kinodynamic planning problems. Fig.

4 to Fig. 7 show example qualitative results from MPC-MPNet and SST. In all these problems, including unseen scenarios, MPC-MPNetTree and MPC-MPNetPath success rates were between and , respectively, comparable to the SST success rates in the given time limit.It can be seen that compared to SST, MPC-MPNet methods take significantly lower computation times to find similar quality path solutions with comparable success rates. Our experiments also show that for environments with high-dimensional state and control spaces such as in Quadrotor, SST’s computation times increase exponentially. In contrast, MPC-MPNet still retains its computational benefits from informative waypoint sampling and outperforms SST by a large margin.

Among MPC-MPNetTree and MPC-MPNetPath, we observed that the former method achieves higher success rates and finds better quality path solutions by exploiting GPU-accelerated, parallel computation. Nevertheless, we present CPU-based MPC-MPNetPath and GPU-based MPC-MPNetTree to highlight that both approaches perform better than traditional gold-standard planning methods. Moreover, our algorithms balance exploration and exploitation through randomly sampling current configurations within trees to provide better completeness and comparable success rates as conventional methods.

Setup | Planning Tasks | |||
---|---|---|---|---|

Acrobot | Cart-Pole | Quadrotor | Car-like | |

w/o D | ||||

w/ D |

### Vi-B Ablation Studies

To show the impact of the neural discriminator in the planning pipeline, we present an ablation study, comparing the planning time and path quality between (i) MPC-MPNetPath without neural discriminator, which predicts only one state at every iteration, and (ii) MPC-MPnetPath, which generates a batch of waypoints with the neural generator, and selects the best based on an estimated cost by the neural discriminator. All experiments are conducted in the same environmental setup as in the comparative studies.

From the results shown in Table II, we can observe that the neural discriminator contributes to reducing the mean and standard deviation of planning time. It also helps with generating trajectories with a better cost quality. In MPC-MPNet, stochasticity is introduced by dropout layers in the neural generator. This operation generates a variety of samples, some of which might require replanning for the connections. Our process eliminates those cases using the neural discriminator by selecting a state with the minimum time to reach the given target, also resulting in fewer planning iterations and better path quality.

## Vii Discussion

In this section, we highlight the worst-case properties of our proposed algorithms named MPC-MPNetPath and MPC-MPNetTree. The former operates MPC without GPU processing and uses discriminator to remove unnecessary states to save computational resources. The latter builds on GPU programming and parallelly computes various nodes for simultaneous local extensions with MPC. Our methods plan without relying on bidirectional tree generation or any replanning as was required in the original MPNet and its variants. However, similar to MPNet, our neural generator does explore the sub-space of given state-space that potentially contains a path solution, thanks to Dropout, that implants randomness into our neural generator adapted from expert demonstrations. Since our generated state-spaces are confined to a subspace, the resulting informed tree also grows in that region due to directed extensions with MPC. To enable our methods to explore the state-space outside the generator’s learned state-spaces, we propose a notion of stage-wise exploration.

Our stage-wise exploration strategy balances global exploitation-exploration in three phases. In the first phase, our proposed algorithms are operated for a fixed number of planning iterations . In the second phase, i.e., after iterations and until iterations, our algorithm replaces the MPC-based local controller with a random shooting method. This phase allows exploration in the action-spaces while still keeping state-spaces informed as given by the neural generator. Finally, after steps and beyond in the third phase, our approach performs full exploration by randomly sampling both state and action spaces, like the SST algorithm. These three phases allow our trees to expand from an inner region, potentially containing a path solution, to an outer region in the worst-case for finding a solution if one exists. In our experimentation, we validated that, similar to MPNet, incorporating staged-wise exploration ensures success rate while still retaining the computational benefits and performing better than classical planning approaches.

Since our methods perform stage-wise exploration of the state and action spaces and eventually explore them entirely over a large number of planning iterations, the resulting approach exhibits probabilistic-completeness

. It implies that MPC-MPNet finds a path, if one exists, with the probability approaching to one as the number of planning iterations reaches infinity. The formal proofs can be derived in the same way as reported in

[li2016asymptotically]. Furthermore, note that our MPC-MPNetTree method also uses the nearest neighbor search for extending trees. This method is adopted from SST, which selects the best neighbors in terms of their cost from the given start/root state and removes tree edges with relatively higher costs. Based on this nearest-neighbor selector and the random exploration, the SST method [li2016asymptotically] also guarantees asymptotic-optimality, i.e., over a large number of planning iterations, their planner will eventually find a minimum cost path solution, if one exists. Since MPC-MPNetTree also does exploration, though in stages, and uses SST’s like nearest node selector and pruner, it also exhibits asymptotic-optimality with proofs being similar to as presented in [li2016asymptotically].## Viii Conclusions

We propose Model Predictive Motion Planning Network or MPC-MPNet, a learning-based algorithm capable of finding solutions in seconds with near-optimal path quality in complex kindynamically constrained scenarios where other state-of-the-art methods take upto minutes for finding a similar quality path solutions. Besides, our experiments show that MPC-MPNet generalizes to unseen tasks and adapts to high-dimensional problems with high success rates. We also show that MPC-MPNet allows parallel computing and exhibits worst-case theoretical completeness and optimality guarantees, making it an ideal planner for practical problems ranging from robot navigation to full-body humanoid motion planning. In our future research direction, we would like to investigate our models’ generalization to new planning problem domains. We are also interested in extending our approach to planning under dynamic and manifold kinematic constraints for robot manipulation problems.