I Introduction
Path or motion planning is a highly important problems for autonomous vehicles and robots. Many need to be simultaneously considered for robot path planning and navigation. For example, specification of mission objectives, examining the dynamical feasibility of a robot, ensuring collision avoidance, and considering the internal constraints of a robot.
Optimizationbased methods for path planning can explicitly perform the abovementioned tasks. The two most wellknown optimal path planning methods for autonomous robots: gradient and samplingbased methods. The former frequently assume that objective and constraint functions in a given planning problem are differentiable; however, they can rapidly provide a locally optimal smooth trajectory. Obtaining a numerical solution typically relies on nonlinear programming solvers such as IPOPT [wachter2006implementation] and SNOPT [gill2005snopt]. In contrast, samplingbased methods do not require differentiability of functions; therefore, they are more constructive than the former methods for modeling obstacles without concern about their shapes in constrained optimization for collisionfree path planning. In addition, samplingbased methods naturally perform exploration, thereby avoiding a local optimum. However, derivativefree samplingbased methods generally produce coarse (e.g., zigzag) trajectories. For example, rapidlyexploring random treesbased methods generate coarse trajectories [kuffner2000rrt]. To mitigate these drawbacks of gradient and samplingbased methods while maintaining the advantages, a hybrid method combining them can be considered as proposed in [campos2017hybrid].
In this study, we propose a hybrid method of trajectory optimization by modularly incorporating sampling and gradientbased methods. Fig. 1 depicts the structure of the proposed collisionfree smooth path planning method. The hybrid method presented in this study generates a coarse trajectory and path corridors by using samplingbased optimization via variational inference (VI). Subsequently, a smooth trajectory is obtained by gradientbased optimization with a differential dynamic programming (DDP) scheme. It is assumed that a collision checker is available to indicate collision occurrence in a binary form, true or false.
VI refers to a class of optimizationbased approaches of finding posterior distribution approximations of unknowns, and it makes Bayesian inference computationally efficient and scalable
[zhang2018advances, murphy2022probabilistic]. Recently proposed model predictive path integral (MPPI) is a samplingbased planning method that uses a VI framework [williams2018information, wang2021variational]. Briefly, it samples random trajectories around a nominal trajectory and assigns weights to them in order of producing low costs. Subsequently, it updates the nominal trajectory with the weighted average. In this study, MPPI was used for generating a coarse trajectory for exploration while avoiding collision.For smoothing the coarse trajectory with a gradientbased optimization, we introduced the concept of the path corridors, which is a wellknown scheme reported in the literature [geraerts2007corridor, campos2017hybrid, cen2021optimization, schafer2021computation, li2021optimization]. Path corridors are collections of convex collisionfree regions guiding a robot toward an aimed position. In this study, unlike the investigations mentioned above, a simple samplingbased optimization method was used to construct corridors.
To produce a smooth trajectory, a differential dynamic programming (DDP) framework can be applied in gradientbased optimization. DDPbased approaches, including iterative linear quadratic regulator, for nonlinear optimal control problems, have recently become commonly used in many applications of planning and nonlinear model predictive control for autonomous systems [chai2020overview, aoyama2021constrained]. DDP is based on Bellman’s principle of optimality and the necessary condition for optimal control problem. In addition, all functions defined in the optimal control problem are assumed to be smooth or at least twice continuously differentiable.
Because the original DDP approaches do not consider any constraints of the system state and inputs, many studies have been conducted to deal with constraints in DDP efficiently. The augmented Lagrangian (AL) method and the KarushKuhnTucker (KKT) condition were used in [howell2019altro] and [xie2017differential], respectively. In [aoyama2021constrained], a method combining the AL method with the KKT condition was proposed. The interior point differential dynamic programming (IPDDP) algorithm [pavlov2021interior], employed in the present study, is based on the KKT condition. IPDDP, which is described in Section IIB, takes all Lagrangian and barrier terms into the socalled Qfunction and solves a minmax problem.
The main contributions of this study are summarized as follows:

We propose a hybrid path planning method that generates collisionfree smooth trajectories by combining samplingbased trajectory optimization (MPPI) and gradientbased smooth optimization (IPDDP).

We propose a method to construct collisionfree convex path corridors by samplingbased optimization using VI.

We present two numerical case studies for realtime path planning by which the effectiveness of the proposed method, MPPIIPDDP, was demonstrated in the present research.
The remainder of this paper is organized as follows. Section II reviews samplingbased optimization by VI and IPDDP. Section III describes the proposed path planning method, called MPPIIPDDP, which produces collisionfree smooth trajectories. In Section IV, twodimensional (2D) and threedimensional (3D) case studies are presented to show the effectiveness of the proposed method. Section V concludes the paper and suggests directions for future studies.
Ii Preliminaries
Iia Samplingbased Optimization by Variational Inference
An optimization problem can be reconstructed as an inference problem, which can be solved by the VI method [kappen2012optimal, levine2018reinforcement]
. To this end, in this study, a binary random variable
indicating optimality was introduced where specifically,is the probability of optimality. For brevity, we write it as
.We considered two different cases of VI for stochastic optimal control: VI for finitedimensional optimization, in which the decision variable is a parameter vector, and VI for trajectory optimization, in which generating the optimal trajectory of a control system is considered.
IiA1 VI for Finitedimensional Optimization
Let be a vector of decision variables. For VI corresponding to stochastic optimization or optimal control, the objective is to find the target distribution, ^{1}^{1}1
We exploited the terminologies of distributions (probability measures) and probability density functions.
defined asLet be the likelihood function and be the empirical approximation of that is computed from samples drawn from prior . Thus, can be represented as
where is the Dirac delta function and is the number of samples. Replacing , is approximated using the forward KullbackLeibler (KL) divergence as follows:
If a normal distribution is chosen for parameterizing the policy,
, then a closedform solution for the optimal policy, , is obtained, where(1) 
In this study, this VIbased stochastic optimization method was used for constructing collisionfree convex path corridors, as described in Section IIIB.
IiA2 VI for Trajectory Optimization
Let be a trajectory consisting of a sequence of controlled states and a sequence of control inputs over a finite timehorizon . The objective is to find the target distribution, , where represents stochastic dynamics as follows:
Let . Thus, can be rewritten as
s.t. 
The closedform solution for the above optimization is expressed as
Let be the empirical distribution of approximated with samples drawn from prior . Thus, can be represented as
Replacing , is approximated with the forward KL divergence.
If a normal distribution is chosen for , then a closed form solution of is obtained, where
(2) 
In this study, this VIbased trajectory optimization method was used in MPPI [williams2018information, wang2021variational] to generate a locally optimal trajectory, as presented in Section IIIA.
IiA3 Additional Notes
One of the most common choices for the likelihood function is , where is the cost function and is known as the inverse temperature. Using this likelihood function, weight , as discussed in Sections IIA1 and IIA2 can be interpreted as the likelihood ratio corresponding to the sampled candidate, or , respectively. Specifically, a low value of implies a high likelihood of optimality at an exponential rate.
Because this samplingbased optimization scheme is an iterative method, the distribution, , affects the prior, , at the next iteration; therefore, eventually reaches a locally optimal point. In this study, we considered normal distributions for the prior and posterior, and only propagated the mean, , and used a fixed covariance without empirical adaptation, as expressed (1) and (2).
In [liu2016stein], the Stein variational gradient descent (SVGD) method was proposed to directly approximate a target distribution by the reverse KL divergence, without using an empirical distribution . In addition, it can deal with complex multimodal distributions and achieve more exploration; consequently, a global optimum is more probable to be found. Although SVGD can be used as in [lambert2020stein], in this study, an empirical distribution and the forward KL divergence were employed for convenience.
IiB Interior Point Differential Dynamic Programming
IPDDP introduced in [pavlov2021interior] can be used to solve a standard discretetime optimal control problem (OCP) expressed as
(3)  
subject to  
where variables and are the system state and the control input vector at time step , respectively, and is the initial condition of the control system. Let the decision vector be denoted as , which is a concatenation of sequential control inputs over a time horizon . Realvalued functions and are the final and stage cost functions, respectively, and defines the controlled state transitions. Vectorvalued function defines the inequality constraints, where denotes the number of constraints. All functions defined in (3) are assumed to be twice continuously differentiable.
In dynamic programming, the OCP (3) can be converted into Bellman’s equation form at time with a given state as follows:
subject to  
where is a value function for the next state and are slack variables. At the final stage, the value function is defined as . For notational convenience, index is not shown in the remainder of this section. The relaxed Lagrangian with the logbarrier terms of is defined by the following function:
where is the barrier parameter and is the Lagrangian multiplier. The relaxed value function, , is defined by a saddle point of the function as follows:
IiB1 Backward Pass
As in the standard DDP scheme, is perturbed up to the quadratic terms at the current nominal points:
(4)  
where is an allones vector and is a diagonal matrix associated with vector . By setting , where , the step direction that satisfies the extremum condition corresponding to the firstorder optimality is determined using the following primaldual KKT system as follows:
(5) 
Solving the KKT system expressed in (5) for , , and , yields
where the coefficient matrices and the vectors are defined as
(6)  
and the intermediate parameters and vectors are
(7)  
Above, and are the primal and dual residuals, respectively. The KKT variables, and , can be rewritten as
Substituting the above expressions of and into the quadratic form, , in (4) and setting yield another representation for the perturbed quadratic form as follows:
where and .
Finally, the perturbed value function is obtained as follows:
where the coefficients are
This perturbed value function, , is recursively used for in the next backward step.
IiB2 Forward Pass
After calculating the perturbations in the backward pass, the nominal points are updated as follows: , , and , where is the step size. In IPDDP, is determined using the filter linesearch method [wachter2006implementation]. While reducing the step size, , starting from , the filter linesearch method accepts those updates that reduce either the cost or constraint violations. If no is found acceptable, the forward pass is terminated for failure.
IiB3 Convergence
The barrier parameter, , is monotonically decreased whenever the local convergence to the central path is achieved. The criterion for local convergence is , where . The global convergence agrees with the sufficiently small .
IiB4 Regularization
To guarantee that is invertible in (IIB1), regularization parameter is added: . increases when it is not invertible or failure occurs in the forward pass. If reaches some upper bound , IPDDP is terminated for failure.
Iii Collisionfree Smooth Trajectory Generation
This study considers the following OCP associated with trajectory optimization for path planning:
(8)  
subject to  
where is the position of a robot and is the set of positions occupied by obstacles. Different from (3), the joint constraints on states and controls are decoupled. The proposed algorithm for solving (8) has three steps: searching for a feasible coarse trajectory using MPPI, constructing path corridors, and smoothing the coarse trajectory by IPDDP.
Iiia Model Predictive Path Integral
First, a coarse trajectory using MPPI is generated. The cost function, , is defined as
(9) 
where the indicator function for collision avoidance is defined as
and the sequence of states is determined by the initial state, , dynamics , and controls .
To satisfy the control constraints in (8), the samples of controls, , are projected onto the constraint set, i.e. , where is a projection operator applied to the feasible set of controls .
Using the method described in Section IIA2, locally optimal controls and corresponding states are obtained. Let be the resulting position of a robot obtained by MPPI.
IiiB Path Corridors
To construct corridors around a path , the following optimization problem is considered:
(10)  
subject to  
where the indicator function for a radial collisionfree corridor is defined as
Parameters are the weights and is the maximum value of . Although the shape of the corridors can be arbitrary, we selected a Euclidean ball represented by two variables: center and radius . The problem as expressed in (10) is designed to enlarge the ball and ensure the center, , close to while containing inside the ball without intersection with obstacles (see Fig. 2). If there are no obstacles around , the optimal solutions are and .
The method described in Section IIA1 was used with to solve the optimization problem in (10) at each stage of the path planning to compute a sequence of collisionfree corridors, which are represented by and . As in MPPI, the constraints on in (10) can be met by projection.
IiiC Trajectory Smoothing
In the final step of the proposed trajectory optimization for path planning, the following OCP is considered for smoothing the coarse trajectory generated by MPPI:
(11)  
subject to  
where is, again, the position of a robot, are the center and radius of the path corridor computed using (10), and is a weight matrix penalizing the deviations from the center of a corridor. The constraint in the last row of (11) is included to ensure the robot remains inside the collisionfree corridors.
IiiD Algorithms
Algorithm 1 summarizes the proposed trajectory optimization method, MPPIIPDDP, for generating collisionfree smooth trajectories. The algorithm consists of three subroutines. First, MPPI uses a derivatefree VI to search a dynamically feasible but coarse trajectory. Second, Corridor also uses a derivatefree VI to construct collisionfree circular corridors around the coarse trajectory. Finally, IPDDP employs a recursive method to smooth the coarse trajectory within the corridors. As demonstrated in the supplementary video available at https://youtu.be/oUAt5sd9Bk, the proposed MPPIDDP is verified to be capable of online replanning.
Iv Case Studies
In this section, we present two simulation results of trajectory optimization conducted to demonstrate the effectiveness of the proposed MPPIIPDDP. The first case is of a wheeled mobile robot, and the second case considers a pointmass quadrotor.
Iva Wheeled Mobile Robot
For an example of 2D path planning, a scenario in which a differential wheeled robot arrives at a given target pose without collision was considered. The kinematic model of the robot is defined as
where are the positions on the x and yaxis respectively; is the angle of orientation; are the velocity and angular velocity, respectively; and is the time interval. Vectors and are the states and the controls, respectively. We set the initial states as and the sampling time interval as .
The constraints of the corresponding OCP for trajectory generation are defined as
where is the set of obstacles shown in Fig. 3 in gray. The cost functions of the corresponding OCP for trajectory generation are defined as
where is the target pose. The parameters for the MPPIIPDDP method are listed in Table I.
Parameter  Value  Parameter  Value 

20  35  
0.5  50  
1000  100  
5000  
3000  
IvB Quadrotor Without Attitude
Parameter  Value  Parameter  Value 

20  35  
0.5  30  
1000  100  
8000  
5000  
As an example of 3D path planning, a scenario in which a quadrotor arrives at a given target position without collision was considered. it was assumed that the quadrotor can be modeled as a point mass. The kinematics of the quadrotor is given by
where and are the position and the velocity, respectively, is the acceleration, is the gravitational acceleration, and is the vector of zaxis. and are the states and the controls, respectively. The initial state is set as , and .
The constraints of the corresponding OCP for trajectory generation are defined as
where the first two constraints represent that the acceleration of the quadrotor must be inside a cone and is the set of obstacles shown in Fig. 7 in gray. When projections are performed to satisfy the conic constraint, the following projection operator was applied for obtaining a secondorder cone:
for , where and are a vector of a compatible dimension and a scalar, respectively. The cost functions of the corresponding OCP for trajectory generation are defined as
where is the target position. The parameters for the MPPIIPDDP method are listed in Table II.
V Conclusion
In this study, we established a new optimizationbased hybrid local path planning method, MPPIIPDDP, to generate a collisionfree smooth optimal trajectory for path planning. Based on two case studies of ground and aerial robot path planning, we demonstrated the effectiveness of the proposed MPPIIPDDP, even in a 3D environment with a complex layout of obstacles, provided that an efficient collision checker is available. The proposed algorithm can be further improved. As previously mentioned, SVGD can be used for improving the exploration. Planning under uncertainty needs to be considered. Future studies will be conducted on realworld applications of the MPPIIPDDP algorithm incorporating a global planner.