Real-time optimal trajectory generation has long been a challenging but essential component in robotics. However, due to nonlinear dynamics, non-convex constraints and high dimensionality, it is difficult to solve these problems in real-time. A popular strategy is to represent trajectories with polynomial splines and to decompose state constraints into convex sets. This allows quadratic programming (QP) methods to be applied, which can solve to global optima efficiently and accurately. This has been applied to path planning for ground robots, autonomous cars [1, 2], humanoid robots  and UAVs [4, 5, 6]
. The disadvantage of this approach is that the time allocated for each segment of the spline has to be fixed in order for the optimization to remain convex, because time enters the optimization objective and constraints nonlinearly. In prior approaches, the allocated time is often chosen heuristically, because optimizing the time allocation scheme along with the path makes the problem non-convex and too slow for real-time use.
One effective strategy proposed in Mellinger et al.  is a space-time decomposition approach that optimizes the path with the time allocation scheme fixed, and then iteratively refines the time allocation scheme through gradient descent. Specifically, in each iteration, two levels of optimization problems are solved: the lower level optimizes the path with the time fixed, and the upper level finds a better time allocation scheme with gradient descent. One unresolved challenge in this framework is finding a good gradient direction for the upper-level optimization, due to the fact that the objective is the solution of a lower-level constrained optimization problem. Finite differences was used in this prior work, but this can be computationally expensive because if segments are used in the spline, then finite differences requires lower-level optimizations to be solved for each gradient evaluation.
The primary contribution in this paper is a bilevel optimization framework for optimal time allocation that estimates the gradient from the dual solution (Lagrange multipliers) of the QP problem. A graphical illustration of this technique is shown in Fig.1. The dual solution to the lower-level optimization problem can be used to calculate a gradient to the upper-level one under the assumption of a fixed active set. Since the dual solution is a by-product of the QP problem, we are able to estimate the gradient “for free”. Compared with finite differences which suffer from the choice of step size and low accuracy, our approach obtains the exact gradient. Experiments are conducted on UAV trajectory optimization problems in the presence of obstacles, and show that our gradient estimation approach is orders of magnitude faster than the current state-of-the-art  while also being more accurate. This leads to an 8 times improvement in suboptimality when the optimizer is given a 40 ms cutoff.
Ii Related Work
Despite intensive research, it is still challenging to find an optimal time allocation scheme of a spline trajectory in real-time. One strategy is to generate a time allocation scheme with heuristics ,  and stick with the scheme during the optimization stage. For example, Gao et al.  use the heuristic of selecting velocity according to the distance to the nearest obstacle, closer the trajectory is to the obstacle, lower the velocity. Though the heuristic is reasonably chosen, the velocity of the trajectory after the optimization stage does not necessarily follow the heuristic. Heuristics, though being cheap to compute, are often not optimal and will sometimes lead to spikes in jerk or snap near connection points, see Fig. (c)c for an example.
One way to improve a non-optimal time allocation scheme is to refine it iteratively using gradient descent [4, 7]. The refinement process computes the gradient of the objective w.r.t. the time allocation scheme, takes a step after choosing a suitable descent direction and step length. To the best of our knowledge, finite difference is the only way being used for gradient calculation in this scenario. However, estimating gradient with finite difference is time-consuming since the objective function is actually the objective of an optimization problem. For each gradient evaluation, this approach has to solve the same number of optimization problems as the number of segments in the spline, thus making it intractable for real-time performance. Moreover, choosing step sizes for finite difference is hard and the result may not necessarily be a good approximation to the actual gradient.
Another strategy to determine a time allocation scheme is to use sampling . This approach randomly samples the duration of each segment until the corresponding QP can be solved, and is applicable to problems with a rather small feasible set, for example the problem of humanoid locomotion where a small change in time can make the optimization problem infeasible. However it may not scale well to generating trajectories with a large number of segments. Also, the work described in  only seeks for a feasible time allocation and does not improve it.
Richter et al.  proposes a framework that is able to generate collision-free spline trajectories and optimize time allocation. They managed to formulate a QP with only equality constraints. However, such a formulation has the disadvantage: (a) Being collision-free is achieved by iteratively adding new points and solve another QP if there is a collision. This strategy has no bound on the number of iterations needed though their experiments show a few iterations are usually enough. (b) Dynamic feasibility is achieved by checking the max acceleration in every iteration and stop the iteration if the acceleration reaches the limit. Since computing the extrema of a high order polynomial is hard according to the Abel-Ruffini theorem, this strategy can be slow when we are using high order polynomials. Moreover, their framework solves the optimization problem by re-formulating the equality constrained QP into an unconstrained one, they will not be able to handle general inequality constraints which often exist in trajectory optimization. In our work, we use bilevel optimization as a way to solve a GBD, and exploit theoretical developments from these fields to derive analytical gradients for trajectory optimization.
More broadly, the idea of decomposition and divide-and-conquer has been explored in the Benders Decomposition (BD)  and Generalized Benders Decomposition (GBD), which are well-known techniques in operations research. In the GBD formulation, complicating variables are variables which would make the optimization problem considerably more tractable when temporarily fixed. The idea of GBD is to split the original optimization problem into two: a master problem and a subproblem. The master problem is generated by temporarily fixing the complicating variables of the original problem, and in BD these are usually integer variables. In the case of trajectory optimization, the complicating variables are times allocated to each segment. This approach is also closely related to bilevel optimization , which refers to a mathematical program where one optimization problem has another optimization problem as one of its constraints, that is, one optimization task is embedded within another. The outer optimization task is often referred to as the upper-level optimization problem and the inner optimization task is known to be the lower-level optimization problem. Bilevel optimization is well-known in production and marketing decision making, and we refer readers to [10, 11] for more comprehensive treatments of the topic.
There are analogues to our approach in the field of machine learning. OptNet
incorporates a QP solver as a layer into the neural network and is able to provide analytical gradients of the solution to the QP w.r.t. input parameters for back propagation. Gouldet al.  presents results on differentiating argmin optimization problems w.r.t. optimization variables in the context of bilevel optimization. Their results give exact gradients for the lower-level optimization problem. However, works described in ,  do not consider finding the gradient of the objective function. Moreover, methods proposed by Gould et al.  involves inverting a Hessian matrix which could be computationally expensive and often not even invertible, and they only describe the case of equality constraints and use a log-barrier function to tackle inequality constraints. The gradient of the objective function is even cheaper to compute, making our approach very suitable for speeding up trajectory optimization.
In this section, we describe how our framework can be applied to trajectory optimization. We start by introducing how trajectory optimization works, then give a mathematical formulation of the problem we are trying to solve and end with a description of our algorithm.
Iii-a Spline-Based Trajectory Optimization
Trajectory optimization asks to find a trajectory that minimizes some measure of performance while satisfying all the necessary constraints, e.g. being collision-free and dynamically feasible, and is formulated as
where encodes a trajectory, with the initial state prescribed. Often the final state is also given, and sometimes the final time is fixed. is the measure of performance. In order to generate safe and dynamically feasible trajectories, the following constraints must be satisfied:
Safety: Position constraints that make sure the position lies in a safe region.
Dynamic Feasibility: Bounds on minimum and maximum velocity and acceleration.
In abuse of notation we write state constraints as and . More precisely, we have , , and depending on higher order derivatives.
If we represent the trajectory as a polynomial spline, we can reframe the problem in terms of spatial (spline coefficients) and temporal variables (spline knot points). Specifically, define a spline with segments and segment durations . The timing of each knot point is given by with . The ’th segment is defined over the domain as
for each . We formulate our splines using 6th order polynomials ().
In order for the trajectory to be smooth, equality constraints should be enforced so:
States at the start and end of the trajectory should match the initial state and (optional) final state.
Continuities at connection points that ensure a smooth transition between each segment of the trajectory. If the trajectory needs to be continuous, equality constraints up to the ’th order should be applied at all knot points.
We found that in the UAV case, applying continuity constraints up to acceleration yields good results.
Often, the objective function is chosen to be the integral of the squared norm of some high-order derivative of the trajectory, such as minimum jerk. As shown in , , these types of objectives can be written as a quadratic function of the coefficients of the polynomials, as long as timing is fixed. Each constraint on an individual state at a fixed time in segment can also be converted to a constraint on the coefficients . Overall, safety and dynamic feasibility constraints can be enforced at a set of collocation points, leading to a nonlinear program. However, there is no guarantee of the whole trajectory satisfying all the constraints.
Another approach is to encode the trajectory as a Bézier spline. Bézier curves have the nice properties that:
The curve is totally contained in the convex hull of its control points.
The derivative of a Bézier curve is again a Bézier curve.
Thus if the trajectory is represented with Bézier curves and the bounds described above are convex or convex in each segment, then enforcing constraints on all the control points will guarantee that the whole trajectory will satisfy those bounds. (Note that this is only a sufficient condition and tends to be conservative, as the whole trajectory may satisfy all the constraints with some control points lying out of bounds.)
Iii-B Formulation of the Bilevel Optimization Problem
In order to efficiently solve the trajectory optimization problem, we explicitly split the spatial part and the temporal part of the trajectory. We gather all the polynomial coefficients in the flattened vector(spatial part), and define the time allocation scheme (temporal part).
We require the objective function to be quadratically separable in the form , in which is a quadratic objective matrix which is positive semidefinite. , and are allowed to be nonlinear in . Separability holds when the objective is expressed as the integral of the squared norm of some high-order derivative of the trajectory, such as minimum jerk. As shown in , , these types of objectives can be written as quadratic functions of as long as timing is fixed. is the part of the objective that is only dependent on , such as penalizing total time with .
If the objective is quadratically separable and the spatial constraints are convex, then a bilevel trajectory optimization problem can be formulated as follows:
The timing constraints and enforce constraints such as positive durations , and possibly fixed total time . We refer interested readers to [2, 3, 5, 6] for different realizations of this general formulation.
Note that although the objective functions remain the same in both the lower-level and upper-level optimization problem, is fixed in the lower-level optimization problem but becomes the variable we want to optimize in the upper-level optimization problem. The lower-level problem is also specified to be a quadratic program (QP), which can be solved efficiently and globally optimally in .
Our strategy is to use a constrained gradient descent on the function with minimizing the QP for a fixed timing . The descent method indeed has been used to solve bilevel optimization problems, and our framework is a variant of this method. Given an feasible , we find a direction and a step length that can make a sufficient decrease in while maintaining the feasibility of the new point . The general issue is the availability of gradients.
Iii-C Analytical gradient estimation
We now start to examine how the dual solution or Lagrange multipliers from the solution of the lower-level optimization problem can be utilized to estimate the gradient of . We assume that the lower-level and upper-level objective functions are the same. Let us consider a general nonlinear optimization problem formulated as:
The Lagrangian is defined as:
Where and are Lagrange multipliers associated with inequality and equality constraints, respectively.
Suppose that minimizes for a fixed value of . Let us consider the first-order optimality conditions, also known as the Karush-Kuhn-Tucker (KKT) conditions. The KKT conditions must be satisfied at the optimal point. Namely, the following set of equations have to be satisfied by the optimal solution tuple :
Where denotes the gradient with respect to , denotes an element-wise multiplication, inequalities and equalities should be interpreted element-wise.
The complementary slackness condition tells us that if the ’th element of is nonzero, then the ’th inequality constraint is active. That is, , where denotes the ’th row in the inequality . If , we say that the ’th constraint is inactive.
The active set is the set of all the indices corresponding to active constraints, which we denote as . The inactive set, which contains all the indices corresponding to inactive constraints, is denoted as . It is clear that and are complements of each other. Note that we do not require the active set to be unique.
Now let us split the inequality constraints into two parts: one that contains all the active constraints and the one that contains all the inactive constraints. For all the active inequality constraints, we can think of them as equality constraints since . We now construct a new equality constraint by stacking the active inequality constraints and equality constraints:
where , denotes the extraction of rows whose indices are in the active set . The vector of Lagrangian multipliers corresponding to this new set of equality constraints denoted as is:
At the optimal point, the tuple must satisfy the following due to KKT conditions:
We would like to examine how the optimal objective changes if a small perturbation is applied to the time allocation scheme while keeping the active set constant. Let us consider an infinitesimal shift applied to the time allocation scheme . We denote the shifted time allocation scheme as , the shift in optimal solution due to the perturbation as , the shift in optimal objective due to the perturbation as
If is differentiable and we apply first-order Taylor expansion on equation Eq. (5b), we get:
Let us assume is differentiable, approximating with a first-order Taylor expansion around results in:
Note that we can rewrite Eq. (5a) as:
Since is the shift of the optimal objective, we can conclude that
as desired. Even more succinctly, since at non-active constraints, we can write this as
Iii-D Two illustrative examples
We illustrate our analytical gradient estimation technique with two toy examples. The first example inspects the case with the objective function being:
The constraint being:
In this example, we assume that the constraint does not depend on . We pick and and inspect the shift of the optimal objective . Fig. 2 gives a graphical illustration. Since the constraint does not depend on , we have:
Then according to Eq. (12):
Iii-E Outer optimization
Our gradient descent algorithm is given in Algorithm 1, which takes an initial guess of the time allocation scheme and a maximum number of iterations as input. It then iteratively descends until some optimality conditions are satisfied or the maximum number of iterations is reached.
In Algorithm 1, Line 2 solves a QP problem with a time allocation scheme and then returns the objective value and the dual solution (Lagrange multipliers) . is now the baseline objective and the rest of the algorithm will improve upon it. Line 4 estimates the gradient of the objective w.r.t. time using the Lagrange multipliers . Line 5 finds a normalized descent direction from the gradient by projecting the gradient onto the null space of constraints on . Line 6, which is further illustrated in Algorithm 2, finds a suitable step length that gives sufficient decrease in the objective function. If such an is found, the line search algorithm also returns the objective, Lagrange multipliers and time allocation associated with the optimal denoted as , and , respectively. Then the objective, Lagrange multipliers and time allocation scheme are updated in Line 11. If the step length cannot be found during the iteration, the iteration stops and returns the last as in Line 8. The optimality conditions used in our implementation are:
Norm of the projected gradient is less than a threshold, or
The change of the objective function is less than a threshold.
Algorithm 2 shows our adaptive backtracking line search routine starting from in the direction of , step length is returned if found as in Line 12 along with the associated objective , Lagrange multipliers and time allocation scheme , otherwise “ not found” will be returned as in Line 14. In our implementation, the Armijo condition  is used for checking sufficient decrease in the objective. The initial step length for each line search is defined as a static variable in Line 1, it will be updated adaptively. The update strategy is similar to the update of trust region radius in a trust region algorithm: If the line search achieved sufficient decrease after the first iteration, the initial step length for the next line search will grow as in Line 9; If the line search found an that leads to sufficient decrease after more than one iterations, then that will set be as the for the next line search as in Line 11. Parameters that control the growing and shrinking of are and , respectively, they are defined as constant variables in Line 2. Note that and . Line 5 takes a new step and line 6 solves the QP with the updated time.
We demonstrate our method on trajectory planning for UAVs in the presence of obstacles. Our implementation of the algorithm is written in Python and C++: Python is used in constructing the QP problem, performing line search and gradient calculation, while all QP solvers run in C++. Pybind11  is used as the interface between Python and C++. All the experiments are carried out on a laptop with a 2.9 GHz Intel Core i5 processor. We make use of the environment generator and planning pipeline from Gao et al. to generate convex collision-free corridors for a UAV . This open source software is able to generate random obstacles in a 3D environment, visualize the obstacles and trajectories in Rviz, and generate safe corridors. We generated 100 tests by randomly sample feasible start points and goal points from the environment generator and record the flight corridor. Statistics of the number of segments from these tests are shown in Fig. 4. Because these involve 6th order splines in a 3D state space, the number of variables in each optimization problem is where is the number of segments.
We establish a minimum-jerk objective function, and fix the total amount of time on the trajectory. The initial timing is allocated from the heuristic described in . The fixed-duration constraint is used for fair comparison with the unrefined trajectory, but we note that our framework is able to handle general objective functions and constraints on timing. Fig. 5 shows an instance of one of these corridors, and how the trajectory can be improved by refining the time allocation scheme.
|Method||Time [ms]||Suboptimality||Constraint Violation|
|Sqopt+LM||171.43 / 99.25||0.025 / 1.8e-7||0.0 / 0.0|
|Sqopt+FD||323.03 / 226.66||1.179 / 0.144||0.0 / 0.0|
|Mosek+LM||330.46 / 288.35||0.016 / 0.0||0.0 / 0.0|
|Mosek+FD||1067.16 / 904.25||7.102 / 0.754||0.0 / 0.0|
|SNOPT||56.44 / 43.34||242.75 / 14.66||0.087 / 0.033|
We compare the results from different combinations of QP solvers and gradient estimation methods listed in Table I. Here we refer to the relative suboptimality, which is calculated as , where denotes the objective value achieved by an algorithm, and denotes the true optimum. In our experiment, we take as the minimum of all the objectives returned by different algorithms.
We test the following solvers: Sqopt , an active-set QP solver intended for large and sparse systems; Mosek , an interior-point QP solver, OSQP , an alternating direction method of multipliers (ADMM) approach, and qpOASES , an active-set QP solver designed for small and dense systems. Experiments show that Sqopt and Mosek yield adequate results, but neither of OSQP or qpOASES works well in our experiment, because qpOASES does not scale well to mid-scale or large problems with sparse structures, and OSQP cannot solve QP to a high accuracy and gives inaccurate Lagrange multipliers.
We compare our Lagrange multiplier method (LM) against finite differences (FD) using Sqopt and Mosek. LM improves overall running time by 2-3 times beyond FD, and moreover terminates with a much lower suboptimality. We suspect this is because of the improved accuracy of the LM gradient estimation. The performance bottleneck of the LM method is moved to the line search step, which takes about 90% of the total computation time while the time spent on gradient estimation is trivial. Sqopt is generally faster than Mosek, and this could be a consequence of active-set methods’ ability to perform warm start. We have tried Quasi-Newton methods but they performed poorly against steepest descent. We also observed that the computation time of our method scales roughly linearly with the number of segments.
We also compare against SNOPT , which is a general nonlinear solver for sparse, large-scale problems. Here we do not decompose the problem, and simply formulate joint spatial and temporal optimization problem as an NLP. We provide analytic gradients to SNOPT for solver robustness. We initialize SNOPT with the unrefined time allocation scheme and the spline coefficients resulting with the first QP solve. However, even starting from a feasible solution, SNOPT does poorly compared to our bilevel optimization framework. SNOPT tends to terminate prematurely without converging, and often moving to an infeasible point. We believe this is because the joint spatial and temporal NLP is ill-conditioned:
The QP objective function exhibits high-order dependence on timing.
Some spatial constraints are very sensitive to the high-order spline coefficients.
The results shown in Fig. 6 explore suboptimality as a function of time. For each cutoff time, we abort the optimization process at that time and record the suboptimality achieved at this time. If no iteration has been finished at the cutoff time, we use the unrefined objective instead. Note that due to the properties of the steepest descent method, the first few iterations will give significant decrease in the objective but the improvement decreases as more time is spent. In applications, we suggest that our optimizer can run in a separate thread and return the best result at a given timeout. If this algorithm were to run in 25Hz (40 ms cutoff time), which is a reasonable frequency for real-time applications, our method achieves a mean and median suboptimality of 14.73 and 0.87 respectively, compared to 117.33 and 6.91 achieved by .
We presented a novel way to estimate the gradient of the objective function w.r.t. time allocation scheme in a trajectory optimization problem and proposed an efficient bilevel optimization framework based on Generalized Benders Decomposition. Our results indicate that we can achieve a significant decrease in the objective of the trajectory while having real-time performance. This framework may have applications in path planning for ground vehicles, humanoid robots and UAVs. Underpowered robots will particularly benefit from this framework since a refined trajectory is smoother, less aggressive and easier to track than its unrefined counterpart.
Future work may include accelerating the gradient descent by exploiting the structure of the problem. For example, acceleration may be achieved through using Newton or Quasi-Newton methods. Also, we are considering the possibility of deploying our framework on a real UAV to further investigate its performance in real-time trajectory optimization.
This work is partially supported by NSF Grant #IIS-1253553.
-  M. Wang, Z. Wang, S. Paudel, and M. Schwager, “Safe distributed lane change maneuvers for multiple autonomous vehicles using buffered input cells,” in ICRA. IEEE, 2018, pp. 1–7.
-  H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,” CoRR, vol. abs/1807.08048, 2018.
-  P. Fernbach, S. Tonneau, and M. Taïx, “Croc: Convex resolution of centroidal dynamics trajectories to provide a feasibility criterion for the multi contact planning problem,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
-  D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors.” in ICRA. IEEE, 2011, pp. 2520–2525.
-  F. Gao, W. Wu, Y. Lin, and S. Shen, “Online safe trajectory generation for quadrotors using fast marching method and bernstein basis polynomial,” in ICRA. IEEE, 2018, pp. 344–351.
-  S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar, “Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments,” IEEE Robotics and Automation Letters, vol. 2, pp. 1688–1695, 2017.
-  C. Richter, A. Bry, and N. Roy, “Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments,” in ISRR, ser. Springer Tracts in Advanced Robotics, vol. 114. Springer, 2013, pp. 649–666.
-  J. F. Benders, “Partitioning procedures for solving mixed-variables programming problems,” Numer. Math., vol. 4, no. 1, pp. 238–252, Dec. 1962. [Online]. Available: http://dx.doi.org/10.1007/BF01386316
-  A. M. Geoffrion, “Generalized benders decomposition,” Journal of Optimization Theory and Applications, vol. 10, no. 4, pp. 237–260, Oct 1972. [Online]. Available: https://doi.org/10.1007/BF00934810
A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From
classical to evolutionary approaches and applications,”
IEEE Transactions on Evolutionary Computation, vol. 22, pp. 276–295, 2018.
-  B. Colson, P. Marcotte, and G. Savard, “An overview of bilevel optimization,” Annals of Operations Research, vol. 153, no. 1, pp. 235–256, Sep 2007. [Online]. Available: https://doi.org/10.1007/s10479-007-0176-2
-  B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer in neural networks,” in ICML, 2017.
-  S. Gould, B. Fernando, A. Cherian, P. Anderson, R. S. Cruz, and E. Guo, “On differentiating parameterized argmin and argmax problems with application to bi-level optimization,” CoRR, vol. abs/1607.05447, 2016.
-  J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. New York, NY, USA: Springer, 2006.
-  S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA: Cambridge University Press, 2004.
-  W. Jakob, J. Rhinelander, and D. Moldovan, “pybind11 – seamless operability between c++11 and python,” 2017, https://github.com/pybind/pybind11.
-  P. E. Gill, W. Murray, M. A. Saunders, and E. Wong, “User’s guide for SQOPT 7.7: Software for large-scale linear and quadratic programming,” Department of Mathematics, University of California, San Diego, La Jolla, CA, Center for Computational Mathematics Report CCoM 18-2, 2018.
-  MOSEK ApS, MOSEK Optimizer API for C, 8.1., 2018. [Online]. Available: https://docs.mosek.com/8.1/capi/index.html
-  B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” ArXiv e-prints, Nov. 2017.
-  H. Ferreau, C. Kirches, A. Potschka, H. Bock, and M. Diehl, “qpOASES: A parametric active-set algorithm for quadratic programming,” Mathematical Programming Computation, vol. 6, no. 4, pp. 327–363, 2014.
-  P. E. Gill, W. Murray, M. A. Saunders, and E. Wong, “User’s guide for SNOPT 7.7: Software for large-scale nonlinear programming,” Department of Mathematics, University of California, San Diego, La Jolla, CA, Center for Computational Mathematics Report CCoM 18-1, 2018.