MPPI-IPDDP: Hybrid Method of Collision-Free Smooth Trajectory Generation for Autonomous Robots

by   Min-Gyeom Kim, et al.
INHA University

This study presents a hybrid trajectory optimization method that generates a collision-free smooth trajectory for autonomous mobile robots. The hybrid method combines sampling-based model predictive path integral (MPPI) control and gradient-based interior-point differential dynamic programming (IPDDP) exploiting their advantages of exploration and smoothing. The proposed method, called MPPI-IPDDP, consists of three steps. The first step generates a coarse trajectory by MPPI control, the second step constructs a collision-free convex corridor, and the third step smooths the coarse trajectory by IPDDP using the collision-free convex corridor computed in the second step. For demonstration, the proposed algorithm was applied to trajectory optimization for differential-driving wheeled mobile robots and point-mass quadrotors. A supplementary video of the simulations can be found at


DL-IAPS and PJSO: A Path/Speed Decoupled Trajectory Optimization and its Application in Autonomous Driving

This paper presents a free space trajectory optimization algorithm of au...

Continuous-Time Trajectory Optimization for Decentralized Multi-Robot Navigation

Multi-robot systems have begun to permeate into a variety of different f...

Autonomous Navigation of AGVs in Unknown Cluttered Environments: log-MPPI Control Strategy

Sampling-based model predictive control (MPC) optimization methods, such...

GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory

Trajectory optimization (TO) aims to find a sequence of valid states whi...

Trajectory Generation for Multiagent Point-To-Point Transitions via Distributed Model Predictive Control

This paper introduces a novel algorithm for multiagent offline trajector...

A Hybrid Method for Online Trajectory Planning of Mobile Robots in Cluttered Environments

This paper presents a method for online trajectory planning in known env...

Derivative Action Control: Smooth Model Predictive Path Integral Control without Smoothing

Here, we present a new approach to generate smooth control sequences in ...

I Introduction

Path or motion planning is a highly important problems for autonomous vehicles and robots. Many need to be simultaneously considered for robot path planning and navigation. For example, specification of mission objectives, examining the dynamical feasibility of a robot, ensuring collision avoidance, and considering the internal constraints of a robot.

Optimization-based methods for path planning can explicitly perform the above-mentioned tasks. The two most well-known optimal path planning methods for autonomous robots: gradient- and sampling-based methods. The former frequently assume that objective and constraint functions in a given planning problem are differentiable; however, they can rapidly provide a locally optimal smooth trajectory. Obtaining a numerical solution typically relies on nonlinear programming solvers such as IPOPT [wachter2006implementation] and SNOPT [gill2005snopt]. In contrast, sampling-based methods do not require differentiability of functions; therefore, they are more constructive than the former methods for modeling obstacles without concern about their shapes in constrained optimization for collision-free path planning. In addition, sampling-based methods naturally perform exploration, thereby avoiding a local optimum. However, derivative-free sampling-based methods generally produce coarse (e.g., zigzag) trajectories. For example, rapidly-exploring random trees-based methods generate coarse trajectories [kuffner2000rrt]. To mitigate these drawbacks of gradient- and sampling-based methods while maintaining the advantages, a hybrid method combining them can be considered as proposed in [campos2017hybrid].

In this study, we propose a hybrid method of trajectory optimization by modularly incorporating sampling- and gradient-based methods. Fig. 1 depicts the structure of the proposed collision-free smooth path planning method. The hybrid method presented in this study generates a coarse trajectory and path corridors by using sampling-based optimization via variational inference (VI). Subsequently, a smooth trajectory is obtained by gradient-based optimization with a differential dynamic programming (DDP) scheme. It is assumed that a collision checker is available to indicate collision occurrence in a binary form, true or false.

Figure 1: Proposed method of collision-free smooth path planning. Contents in red box are subjects in this study.

VI refers to a class of optimization-based approaches of finding posterior distribution approximations of unknowns, and it makes Bayesian inference computationally efficient and scalable 

[zhang2018advances, murphy2022probabilistic]. Recently proposed model predictive path integral (MPPI) is a sampling-based planning method that uses a VI framework [williams2018information, wang2021variational]. Briefly, it samples random trajectories around a nominal trajectory and assigns weights to them in order of producing low costs. Subsequently, it updates the nominal trajectory with the weighted average. In this study, MPPI was used for generating a coarse trajectory for exploration while avoiding collision.

For smoothing the coarse trajectory with a gradient-based optimization, we introduced the concept of the path corridors, which is a well-known scheme reported in the literature [geraerts2007corridor, campos2017hybrid, cen2021optimization, schafer2021computation, li2021optimization]. Path corridors are collections of convex collision-free regions guiding a robot toward an aimed position. In this study, unlike the investigations mentioned above, a simple sampling-based optimization method was used to construct corridors.

To produce a smooth trajectory, a differential dynamic programming (DDP) framework can be applied in gradient-based optimization. DDP-based approaches, including iterative linear quadratic regulator, for nonlinear optimal control problems, have recently become commonly used in many applications of planning and nonlinear model predictive control for autonomous systems [chai2020overview, aoyama2021constrained]. DDP is based on Bellman’s principle of optimality and the necessary condition for optimal control problem. In addition, all functions defined in the optimal control problem are assumed to be smooth or at least twice continuously differentiable.

Because the original DDP approaches do not consider any constraints of the system state and inputs, many studies have been conducted to deal with constraints in DDP efficiently. The augmented Lagrangian (AL) method and the Karush-Kuhn-Tucker (KKT) condition were used in [howell2019altro] and [xie2017differential], respectively. In [aoyama2021constrained], a method combining the AL method with the KKT condition was proposed. The interior point differential dynamic programming (IPDDP) algorithm [pavlov2021interior], employed in the present study, is based on the KKT condition. IPDDP, which is described in Section II-B, takes all Lagrangian and barrier terms into the so-called Q-function and solves a min-max problem.

The main contributions of this study are summarized as follows:

  • We propose a hybrid path planning method that generates collision-free smooth trajectories by combining sampling-based trajectory optimization (MPPI) and gradient-based smooth optimization (IPDDP).

  • We propose a method to construct collision-free convex path corridors by sampling-based optimization using VI.

  • We present two numerical case studies for real-time path planning by which the effectiveness of the proposed method, MPPI-IPDDP, was demonstrated in the present research.

The remainder of this paper is organized as follows. Section II reviews sampling-based optimization by VI and IPDDP. Section III describes the proposed path planning method, called MPPI-IPDDP, which produces collision-free smooth trajectories. In Section IV, two-dimensional (2D) and three-dimensional (3D) case studies are presented to show the effectiveness of the proposed method. Section V concludes the paper and suggests directions for future studies.

Ii Preliminaries

Ii-a Sampling-based Optimization by Variational Inference

An optimization problem can be reconstructed as an inference problem, which can be solved by the VI method [kappen2012optimal, levine2018reinforcement]

. To this end, in this study, a binary random variable

indicating optimality was introduced where specifically,

is the probability of optimality. For brevity, we write it as


We considered two different cases of VI for stochastic optimal control: VI for finite-dimensional optimization, in which the decision variable is a parameter vector, and VI for trajectory optimization, in which generating the optimal trajectory of a control system is considered.

Ii-A1 VI for Finite-dimensional Optimization

Let be a vector of decision variables. For VI corresponding to stochastic optimization or optimal control, the objective is to find the target distribution, 111

We exploited the terminologies of distributions (probability measures) and probability density functions.

defined as

Let be the likelihood function and be the empirical approximation of that is computed from samples drawn from prior . Thus, can be represented as

where is the Dirac delta function and is the number of samples. Replacing , is approximated using the forward Kullback-Leibler (KL) divergence as follows:

If a normal distribution is chosen for parameterizing the policy,

, then a closed-form solution for the optimal policy, , is obtained, where


In this study, this VI-based stochastic optimization method was used for constructing collision-free convex path corridors, as described in Section III-B.

Ii-A2 VI for Trajectory Optimization

Let be a trajectory consisting of a sequence of controlled states and a sequence of control inputs over a finite time-horizon . The objective is to find the target distribution, , where represents stochastic dynamics as follows:

Let . Thus, can be rewritten as


The closed-form solution for the above optimization is expressed as

Let be the empirical distribution of approximated with samples drawn from prior . Thus, can be represented as

Replacing , is approximated with the forward KL divergence.

If a normal distribution is chosen for , then a closed form solution of is obtained, where


In this study, this VI-based trajectory optimization method was used in MPPI [williams2018information, wang2021variational] to generate a locally optimal trajectory, as presented in Section III-A.

Ii-A3 Additional Notes

One of the most common choices for the likelihood function is , where is the cost function and is known as the inverse temperature. Using this likelihood function, weight , as discussed in Sections II-A1 and II-A2 can be interpreted as the likelihood ratio corresponding to the sampled candidate, or , respectively. Specifically, a low value of implies a high likelihood of optimality at an exponential rate.

Because this sampling-based optimization scheme is an iterative method, the distribution, , affects the prior, , at the next iteration; therefore, eventually reaches a locally optimal point. In this study, we considered normal distributions for the prior and posterior, and only propagated the mean, , and used a fixed covariance without empirical adaptation, as expressed (1) and (2).

In [liu2016stein], the Stein variational gradient descent (SVGD) method was proposed to directly approximate a target distribution by the reverse KL divergence, without using an empirical distribution . In addition, it can deal with complex multi-modal distributions and achieve more exploration; consequently, a global optimum is more probable to be found. Although SVGD can be used as in [lambert2020stein], in this study, an empirical distribution and the forward KL divergence were employed for convenience.

Ii-B Interior Point Differential Dynamic Programming

IPDDP introduced in [pavlov2021interior] can be used to solve a standard discrete-time optimal control problem (OCP) expressed as

subject to

where variables and are the system state and the control input vector at time step , respectively, and is the initial condition of the control system. Let the decision vector be denoted as , which is a concatenation of sequential control inputs over a time horizon . Real-valued functions and are the final and stage cost functions, respectively, and defines the controlled state transitions. Vector-valued function defines the inequality constraints, where denotes the number of constraints. All functions defined in (3) are assumed to be twice continuously differentiable.

In dynamic programming, the OCP (3) can be converted into Bellman’s equation form at time with a given state as follows:

subject to

where is a value function for the next state and are slack variables. At the final stage, the value function is defined as . For notational convenience, index is not shown in the remainder of this section. The relaxed Lagrangian with the log-barrier terms of is defined by the following -function:

where is the barrier parameter and is the Lagrangian multiplier. The relaxed value function, , is defined by a saddle point of the -function as follows:

Ii-B1 Backward Pass

As in the standard DDP scheme, is perturbed up to the quadratic terms at the current nominal points:


where is an all-ones vector and is a diagonal matrix associated with vector . By setting , where , the step direction that satisfies the extremum condition corresponding to the first-order optimality is determined using the following primal-dual KKT system as follows:


Solving the KKT system expressed in (5) for , , and , yields

where the coefficient matrices and the vectors are defined as


and the intermediate parameters and vectors are


Above, and are the primal and dual residuals, respectively. The KKT variables, and , can be rewritten as

Substituting the above expressions of and into the quadratic form, , in (4) and setting yield another representation for the perturbed quadratic form as follows:

where and .

Finally, the perturbed value function is obtained as follows:

where the coefficients are

This perturbed value function, , is recursively used for in the next backward step.

Ii-B2 Forward Pass

After calculating the perturbations in the backward pass, the nominal points are updated as follows: , , and , where is the step size. In IPDDP, is determined using the filter line-search method [wachter2006implementation]. While reducing the step size, , starting from , the filter line-search method accepts those updates that reduce either the cost or constraint violations. If no is found acceptable, the forward pass is terminated for failure.

Ii-B3 Convergence

The barrier parameter, , is monotonically decreased whenever the local convergence to the central path is achieved. The criterion for local convergence is , where . The global convergence agrees with the sufficiently small .

Ii-B4 Regularization

To guarantee that is invertible in (II-B1), regularization parameter is added: . increases when it is not invertible or failure occurs in the forward pass. If reaches some upper bound , IPDDP is terminated for failure.

Iii Collision-free Smooth Trajectory Generation

This study considers the following OCP associated with trajectory optimization for path planning:

subject to

where is the position of a robot and is the set of positions occupied by obstacles. Different from (3), the joint constraints on states and controls are decoupled. The proposed algorithm for solving (8) has three steps: searching for a feasible coarse trajectory using MPPI, constructing path corridors, and smoothing the coarse trajectory by IPDDP.

Iii-a Model Predictive Path Integral

First, a coarse trajectory using MPPI is generated. The cost function, , is defined as


where the indicator function for collision avoidance is defined as

and the sequence of states is determined by the initial state, , dynamics , and controls .

To satisfy the control constraints in (8), the samples of controls, , are projected onto the constraint set, i.e. , where is a projection operator applied to the feasible set of controls .

Using the method described in Section II-A2, locally optimal controls and corresponding states are obtained. Let be the resulting position of a robot obtained by MPPI.

Iii-B Path Corridors

To construct corridors around a path , the following optimization problem is considered:

subject to

where the indicator function for a radial collision-free corridor is defined as

Parameters are the weights and is the maximum value of . Although the shape of the corridors can be arbitrary, we selected a Euclidean ball represented by two variables: center and radius . The problem as expressed in (10) is designed to enlarge the ball and ensure the center, , close to while containing inside the ball without intersection with obstacles (see Fig. 2). If there are no obstacles around , the optimal solutions are and .

The method described in Section II-A1 was used with to solve the optimization problem in (10) at each stage of the path planning to compute a sequence of collision-free corridors, which are represented by and . As in MPPI, the constraints on in (10) can be met by projection.

(a) Maximally inflated path corridor
(b) Inflation process
Figure 2: Schematics for collision-free path corridors.

Iii-C Trajectory Smoothing

In the final step of the proposed trajectory optimization for path planning, the following OCP is considered for smoothing the coarse trajectory generated by MPPI:

subject to

where is, again, the position of a robot, are the center and radius of the path corridor computed using (10), and is a weight matrix penalizing the deviations from the center of a corridor. The constraint in the last row of (11) is included to ensure the robot remains inside the collision-free corridors.

IPDDP introduced in Section II-B was adopted to solve (11) and obtain a smooth trajectory. At the time, the coarse trajectory from the MPPI can be used for the initial guess, i.e., a warm start for local optimization; this can considerably accelerate the convergence rate of IPDDP.

Iii-D Algorithms

Algorithm 1 summarizes the proposed trajectory optimization method, MPPI-IPDDP, for generating collision-free smooth trajectories. The algorithm consists of three subroutines. First, MPPI uses a derivate-free VI to search a dynamically feasible but coarse trajectory. Second, Corridor also uses a derivate-free VI to construct collision-free circular corridors around the coarse trajectory. Finally, IPDDP employs a recursive method to smooth the coarse trajectory within the corridors. As demonstrated in the supplementary video available at, the proposed MPPI-DDP is verified to be capable of online replanning.

1:Input: initial state , collision checker
2:Output: locally optimal controls
3:Initialize: controls
4:while not terminated do
6:     for  do
8:     end for
10:      (11)
11:end while
Algorithm 1
1:for  do
3:      (9)
4:end for
6:for  do
9:end for
Algorithm 2
5:while not sufficiently inflated do
6:     for  do
8:           (10)
9:     end for
11:     for  do
14:     end for
18:end while
19:  (unpacking)
Algorithm 3
1:while not converged globally and not max iteration do
2:     evaluate all derivatives needed;
3:     try backwardpass; Section II-B1
4:     try forwardpass; Section II-B2
5:     if any failure occurs then Section II-B4
6:          increase regularization parameter ;
7:          if  then
8:               break; Solve failed
9:          end if
10:          continue;
11:     else
12:          decrease regularization parameter ;
13:          update nominal trajectory;
14:     end if
15:     if locally converged then Section II-B3
16:          decrease barrier parameter ;
17:          reinitialize filter;
18:     end if
19:end while
Algorithm 4

Iv Case Studies

In this section, we present two simulation results of trajectory optimization conducted to demonstrate the effectiveness of the proposed MPPI-IPDDP. The first case is of a wheeled mobile robot, and the second case considers a point-mass quadrotor.

Iv-a Wheeled Mobile Robot

Figure 3: Iterations of MPPI-IPDDP algorithm for generating collision-free path from to . Black dots mark positions of robot, red circles represent the path corridors, and gray color represents obstacles. Early part of iterations shows that constraint violation (black dots are not in corridors) because IPDDP is terminated by maximum iteration limit. However, finally, MPPI-IPDDP algorithm finds optimal collision-free trajectory.
(a) Coarse controls by MPPI
(b) Smooth controls by IPDDP
(a) Cost of trajectory reduces over iterations.
(b) Maximum value of primal residual approaches zero, indicating that constraints are satisfied.
Figure 4: Comparison of control inputs obtained by MPPI and IPDDP.
Figure 5: Cost reduction and convergence rate over MPPI-IPDDP iterations.
Figure 4: Comparison of control inputs obtained by MPPI and IPDDP.

For an example of 2D path planning, a scenario in which a differential wheeled robot arrives at a given target pose without collision was considered. The kinematic model of the robot is defined as

where are the positions on the x- and y-axis respectively; is the angle of orientation; are the velocity and angular velocity, respectively; and is the time interval. Vectors and are the states and the controls, respectively. We set the initial states as and the sampling time interval as .

The constraints of the corresponding OCP for trajectory generation are defined as

where is the set of obstacles shown in Fig. 3 in gray. The cost functions of the corresponding OCP for trajectory generation are defined as

where is the target pose. The parameters for the MPPI-IPDDP method are listed in Table I.

Parameter Value Parameter Value
20 35
0.5 50
1000 100
Table I: Parameters for trajectory optimization of wheeled mobile robot, as described in Section IV-A.

Fig. 3 shows the processing results of generating a smooth trajectory. In Fig. 5, the zigzag controls obtained by MPPI and the smoother ones by IPDDP are compared. Fig. 5 shows that the cost and constraint violations reduce with increasing MPPI-IPDDP iterations.

Iv-B Quadrotor Without Attitude

Parameter Value Parameter Value
20 35
0.5 30
1000 100
Table II: Parameters for trajectory optimization of quadrator, as described in Section IV-B.
Figure 6: Iterations for generating path from to by MPPI-IPDDP. Black dots are positions of quadrotor, red spheres are path corridors, and gray color represents obstacles. Optimal trajectory passes through small hole and reaches destination.

Figure 7: Side view of generated trajectory. Green arrows represent directions of acceleration.
(a) Coarse controls by MPPI
(b) Smooth controls by IPDDP
(a) Cost of trajectory reduces over iterations.
(b) Maximum value of primal residual approaches zero, indicating that constraints are satisfied.
Figure 8: Comparison of control inputs obtained by MPPI and IPDDP.
Figure 9: Cost reduction and convergence rate over MPPI-IPDDP iterations.
Figure 8: Comparison of control inputs obtained by MPPI and IPDDP.

As an example of 3D path planning, a scenario in which a quadrotor arrives at a given target position without collision was considered. it was assumed that the quadrotor can be modeled as a point mass. The kinematics of the quadrotor is given by

where and are the position and the velocity, respectively, is the acceleration, is the gravitational acceleration, and is the vector of z-axis. and are the states and the controls, respectively. The initial state is set as , and .

The constraints of the corresponding OCP for trajectory generation are defined as

where the first two constraints represent that the acceleration of the quadrotor must be inside a cone and is the set of obstacles shown in Fig. 7 in gray. When projections are performed to satisfy the conic constraint, the following projection operator was applied for obtaining a second-order cone:

for , where and are a vector of a compatible dimension and a scalar, respectively. The cost functions of the corresponding OCP for trajectory generation are defined as

where is the target position. The parameters for the MPPI-IPDDP method are listed in Table II.

Fig. 6 shows the processing results of generating a smooth trajectory. Fig. 9 compares the noisy controls obtained by MPPI and the smooth ones obtained by IPDDP. Fig. 9 shows that the cost and constraint violations reduce with increasing MPPI-IPDDP iterations.

V Conclusion

In this study, we established a new optimization-based hybrid local path planning method, MPPI-IPDDP, to generate a collision-free smooth optimal trajectory for path planning. Based on two case studies of ground and aerial robot path planning, we demonstrated the effectiveness of the proposed MPPI-IPDDP, even in a 3D environment with a complex layout of obstacles, provided that an efficient collision checker is available. The proposed algorithm can be further improved. As previously mentioned, SVGD can be used for improving the exploration. Planning under uncertainty needs to be considered. Future studies will be conducted on real-world applications of the MPPI-IPDDP algorithm incorporating a global planner.