Trajectory optimization is a powerful tool for synthesizing trajectories for nonlinear dynamical systems. Indirect methods, in particular, are able to efficiently find optimal solutions to this class of problem and return dynamically feasible trajectories by performing rollouts and utilizing gradients of the system’s dynamics.
Classically, indirect methods like iterative LQR (iLQR) [jacobson1970differential] utilize explicit dynamics representations, which can be directly evaluated and differentiated in order to return gradients. In this work, we present a more general framework for optimization-based dynamics representations. This latter class enables partial elimination of trajectory-level constraints by absorbing them into the dynamics representation.
We formulate optimization-based dynamics as a constrained optimization problem and provide a path-following method for efficient evaluation of the dynamics at each time step. The implicit-function theorem is utilized to differentiate through this representation, and we exploit the properties of our path-following method to return useful, smooth gradients to an upper-level optimizer.
To demonstrate the capabilities of this representation, we utilize optimization-based dynamics and iLQR in a bi-level optimization framework to perform trajectory optimization. We provide a number of optimization-based dynamics models and examples in simulation including: an acrobot with joint limits, a cart-pole experiencing friction, gait generation for a Raibert hopper, a belly-flop soft landing for a rocket subject to thrust limits, and a planar-push manipulation task.
Specifically, our contributions are:
A novel framework for optimization-based dynamics that can be used with trajectory-optimization methods that require gradients
A path-following method that can efficiently evaluate constrained optimization problems and return smooth gradients
A collection of optimization-based dynamics models and examples from locomotion, aerospace, and manipulation domains that demonstrate the proposed bi-level trajectory-optimization framework
In the remainder of this paper, we first review related work on bi-level approaches to trajectory optimization in Section II. Next, we present background on the iLQR algorithm, the implicit-function theorem, and implicit integrators in Section III. Then, we present our optimization-based dynamics representation, as well as a path-following method for solving this problem class in Section IV. We provide a collection of optimization-based dynamics models that are utilized to perform trajectory optimization, and comparisons, in Section V. Finally, we conclude with discussion and directions for future work in Section VI.
Ii Related Work
Bi-level optimization [sinha2017review] is a framework where an upper-level optimizer utilizes the results, i.e., solution and potentially gradients, of a lower-level optimization problem. Approaches typically either implicitly solve the lower-level problem and compute gradients using the solution, or explicitly represent the optimality conditions of the lower-level problem as constraints in the upper-level problem. For example, the MuJoCo simulator [todorov2012mujoco] has been employed in an implicit bi-level approach for whole-body model-predictive control of a humanoid robot [koenemann2015whole]. The lower-level simulator solves a convex optimization problem in order to compute contact forces for rigid-body dynamics, and the results are utilized to perform rollouts and return finite-differenced gradients to the upper-level iLQR optimizer. In contrast, explicit approaches have used the optimality conditions of linear-complementarity-problem contact dynamics as constraints in a direct trajectory-optimization method [posa2014direct]. A related approach formulates contact dynamics as lower-level holonomic constraints [mastalli2020crocoddyl]. Implicit integrators, which are a specific instance of optimization-based dynamics and are widely used in collocation methods [von1993numerical], have been explored with differentiable dynamic programming for models that require cloth simulation [zimmermann2021dynamic]. Implicit dynamics have also been trained to represent non-smooth dynamics [pfrommer2020contactnets], although this work has not been utilized for trajectory optimization. Direct methods with implicit lower-level problems have been used in locomotion applications for tracking reference trajectories with model-predictive control [lecleach2021linear] and a semi-direct method utilizes a lower-level friction problem for planning through contact events [landry2019bilevel]. A related, sequential operator splitting framework is proposed in [sindhwani2017sequential]. In this work we focus on implicit lower-level problems and indirect methods, specifically iLQR, as the upper-level optimizer.
In this section we provide background on the iLQR algorithm, implicit-function theorem, and implicit integrators.
Iii-a Iterative LQR
iLQR is an algorithm for trajectory optimization that solves instances of the following problem:
For a system with state , control inputs , time index , initial state , and discrete-time dynamics , the algorithm aims to minimize an objective with stage-cost functions, , and terminal-cost function, , over a planning horizon .
The algorithm utilizes gradients of the objective and dynamics, and exploits the problem’s temporal structure. The state trajectory is defined implicitly by the initial state and only the control trajectory is parameterized as decision variables. Closed-loop rollouts enable this indirect method to work well in practice. The overall complexity is linear in the planning horizon and cubic in the control-input dimension [tassa2007receding]
While efficient, the algorithm does not natively handle additional general equality or inequality constraints. This limitation has led to many variations of the classic algorithm in order to accommodate constraints in some capacity at the solver level. Box constraints have been incorporated at each step in the backward pass in order to enforce control limits [tassa2014control]. An alternative approach embeds controls in barrier functions which smoothly approximate constraints [marti2020squash]. Augmented Lagrangian methods and constrained backward and forward passes have also been proposed for handling general constraints [howell2019altro, xie2017differential, lantoine2008hybrid].
Iii-B Implicit-Function Theorem
An implicit function, , is defined as
for solutions and problem data .
At an equilibrium point, , the sensitivities of the solution with respect to the problem data, i.e., , can be computed [dini1907lezioni]. First, we approximate (2) to first order:
and then the sensitivity of the solution is computed as:
In the case that is not full rank, an approximate solution to (4), e.g., least-squares, can be computed.
Iii-C Implicit Integrators
Unlike explicit integrators, i.e., , implicit integrators are an implicit function of the next state [brudigam2020linear, manchester2016quaternion], which cannot be separated from the current state and control input:
and are often used for numerical simulation of stiff systems due to improved stability and accuracy compared to explicit integrators, at the expense of increased computation [wanner1996solving].
For direct trajectory-optimization methods that parameterize states and controls and allow for dynamic infeasibility during iterates, such integrators are easily utilized since the state at the next time step is already available as a decision variable to the optimizer [von1993numerical]. However, for indirect methods, like iLQR, that enforce strict dynamic feasibility at each iteration via rollouts, evaluating (5) requires a numerical solver to find a solution that satisfies this implicit function for problem data .
In practice, the next state can be found efficiently and reliably using Newton’s method. Typically, the current state is used to initialize the solver and less than 10 iterations are required to solve the root-finding problem to machine precision. We describe dynamics with implicit integrators as:
Having satisfied (5) to find the next state, we can compute the dynamics Jacobians using the implicit-function theorem. First, the dynamics are approximated to first order:
and then we solve for :
are returned at each time step.
Iv Optimization-Based Dynamics
In the previous section, we presented dynamics with implicit integrators that can be evaluated during rollouts and differentiated. In this section, we present a more general representation: optimization-based dynamics, which solve a constrained optimization problem in order to evaluate dynamics and use the implicit-function theorem to compute gradients by differentiating through the problem’s optimality conditions.
Iv-a Problem Formulation
For dynamics we require: fast and reliable evaluation, gradients that are useful to an upper-level optimizer, and (ideally) tight constraint satisfaction. We consider problems of the form:
with decision variable , problem data , objective , equality constraints , and cone . Typically, the problem data include the current state and control, , and the next state belongs to the optimal solution set.
Iv-B Path-Following Method
One of the primary challenges in optimizing (11) is selecting a solver that is well-suited to the requirements imposed by dynamics representations. Solvers for this class of problem are generally categorized as first-order projection [stellato2020osqp, o2016conic, garstka2019cosmo] or second-order path-following methods [domahidi2013ecos, vandenberghe2010cvxopt]. The first approach optimizes (11) by splitting the problem and alternating between inexpensive first-order methods, and potentially non-smooth, projections onto the cone. The second approach formulates and optimizes barrier subproblems, using second-order methods, along a central path, eventually converging to the cone’s boundary in the limit [boyd2004convex]. Second-order semi-smooth methods also exist [ali2017semismooth].
The first-order projection-based methods, while fast, often can only achieve coarse constraint satisfaction and, importantly, the gradients returned are usually subgradients, which are less useful to an optimizer. In contrast, path-following methods exhibit fast convergence, can achieve tight constraint satisfaction [Nocedal2006], and can return smooth gradients using a relaxed central-path parameter.
Based on these characteristics, we utilize a path-following method [Nocedal2006] to optimize (11). The optimality conditions for a barrier subproblem are:
with duals and , cone product denoted with the operator, central-path parameter , and dual cone [domahidi2013ecos]. We consider free: ; nonnegative: ; second-order:
and the Cartesian product of these cones. For nonnegative cones, the cone product is an element-wise product and . For second-order cones, the cone product is:
and [domahidi2013ecos]. The nonnegative and second-order cones are self dual, i.e., , while the dual of the free cone is the zero cone, i.e., .
To find a stationary point of (12-15), for a fixed central-path parameter , we consider and a residual comprising (12-14). A search direction, , is computed using the residual and its Jacobian with respect to the decision variables at the current point. A backtracking line search is performed to ensure that a candidate step respects the cone constraints and reduces the norm of the residual. Once the residual norm is below a desired tolerance, we cache the current solution , the central-path parameter is decreased, and the procedure is repeated until the central-path parameter is below a desired tolerance.
To differentiate (11), we apply the implicit-function theorem (4) to the residual at an intermediate solution, . In practice, we find that performing implicit differentiation with a solution point having a relaxed central-path parameter returns smooth gradients that improve the convergence behavior of the upper-level optimizer. The complete algorithm is summarized in Algorithm 1.
We formulate optimization-based dynamics models and use iLQR to perform bi-level trajectory optimization for a number of examples that highlight how these representations can be constructed, demonstrate that non-smooth and constrained dynamics can be optimized, provide a comparison of smooth gradients, and compare implicit and explicit bi-level approaches.
Throughout, we use implicit midpoint integrators, quadratic costs, and for convenience, employ an augmented Lagrangian method to enforce any remaining trajectory-level constraints (e.g., terminal constraints), not handled implicitly by the dynamics representation. Our implementation and all of the experiments are available here: https://github.com/thowell/motion_planning/examples/implicit_dynamics.
V-a Acrobot with Joint Limits
We model an acrobot [tedrake2014underactuated] with joint limits on the actuated elbow. These limits are enforced with a signed-distance constraint,
where is the system’s configuration and is the elbow angle. Additional constraints,
enforce physical behaviors that impact impulses can only be non-negative, i.e., repulsive not attractive, and that they are only applied to the system when the joint reaches a limit. Relaxing the complementarity constraint via a central-path parameter, introducing a slack variable for the signed-distance function , and combining this reformulation with the system’s dynamics results in a problem formulation (12-15) that can be optimized with Algorithm 1.
Unlike approaches that include joint limits at the solver level, including the impact (19) and limit (18) constraints at the dynamics level enables impact forces encountered at joint stops to be optimized and applied to the system.
The system has states and controls. We plan a swing-up trajectory over a horizon with a time step . The optimizer is initialized with random controls.
We compare unconstrained and joint-limited systems, the optimized motions are shown in Fig. 2. The unconstrained system violates joint limits, while the joint-limited system reaches the limits without exceeding them and finds a motion utilizing additional pumps.
V-B Cart-Pole with Coulomb Friction
A cart-pole [tedrake2014underactuated] is modeled that experiences Coulomb friction [moreau2011unilateral] on both its slider and pendulum arm. The friction force that maximally dissipates kinetic energy is the solution to the following optimization problem:
where is the joint velocity, is the friction force, and is the friction-cone limit [lobo1998applications]. The optimality conditions for the cone program’s barrier subproblem are:
with , such that , and dual variables and for the equality and cone constraints, respectively.
The system has states, controls, and dimension . We make the modeling assumption that , which is a function of the friction coefficient, is fixed for both joints. We plan a swing-up trajectory for a horizon and time step . The optimizer is initialized with an impulse at the first time step and zeros for the remaining initial control inputs.
We compare the trajectories optimized for systems with increasing amounts of friction, i.e., increasing , in Fig. 3. We observe that the system must perform more aggressive maneuvers in order to succeed at the swing-up task in the presence of increasing friction.
V-C Hopper Gait
We generate gaits for a Raibert hopper [raibert1989dynamically]. The system’s dynamics are modeled as a nonlinear complementarity problem [lecleach2021linear]. The 2D system has four generalized coordinates: lateral and vertical body positions, body orientation, and leg length. There are
control inputs: a body moment and leg force. The state is, comprising the system’s current and previous configurations. The horizon is with time step . The initial state is optimized and a trajectory-level periodicity constraint is employed to ensure that the first and final states, with the exception of the lateral positions, are equivalent. Finally, we initialize the solver with controls that maintain the system in an upright configuration.
By tuning the cost functions we generate qualitatively different gaits. The optimizer finds single-hop and multi-hop trajectories that satisfy the periodicity constraint. We repeat the optimized template to form a gait, three are shown in Fig. 4.
V-D Rocket Landing
We plan a soft-landing for a rocket with 6 degrees of freedom that must transition from a horizontal to vertical orientation prior to landing while respecting thrust constraints. Unlike prior work that enforces such constraints at the optimizer level[blackmore2010minimum], we instead enforce these constraints at the dynamics level. This enables the optimizer to provide smooth controls to the dynamics, which then internally constrain the input to satisfy the system’s limits at every iteration.
The cone projection problem, which finds the thrust vector that is closest to the optimizer’s input thrust while satisfying constraints, is formulated as:
where are the optimized and provided thrust vectors, respectively, is the component of thrust along the longitudinal axis of the rocket, and are limits on this value, and is a scaling parameter.
The system has states and controls that are first projected using (25) before being applied to dynamics. The planning horizon is and time step is . The controls are initialized with small random values, the initial pose is offset from the target, and the rocket has initial downward velocity. A constraint enforces the rocket’s final pose, a zero velocity, vertical-orientation configuration in a landing zone. The scaling parameter is set to one.
The optimizer finds a plan that successfully reorients the rocket prior to landing while respecting the thrust constraints, which would otherwise be violated at the first time steps without the thrust projection. The normalized optimized controls and position trajectory are provided in Fig. 5.
V-E Planar Push
For the canonical manipulation problem of planar pushing [hogan2016feedback], we optimize the positions and forces of a robotic manipulator’s circular end-effector in order to slide a planar bock into an oriented goal pose (Fig. 1). The system’s end-effector is modeled as a fully actuated particle in 2D that can move the block (with 2D translation and orientation) via impulses and friction forces while the two systems are in contact. The block is modeled with point friction at each of its four corners and a single signed-distance function is employed that enables the end-effector to impact and push on any of the block’s sides.
The system has states, controls, and forces are optimized. The planning horizon is and the timestep is . We initialize the end-effector with a control input that overcomes stiction to move the block. The block is initialized at the origin with the pusher in contact on its left side, and the block is constrained at the trajectory-level to reach a goal pose.
|failure||128 9||206 10||582 133||630 68|
V-F Comparison of Smooth Gradients
We compare the effect of smooth gradients on the upper-level optimizer by varying the value of the central-path parameter used by the implicit-function theorem to compute the gradients of dynamics. The results for the acrobot swingup problem, provided in Table I, are computed over five trials of random initial control inputs for each central-path value. We find that iLQR requires more iterations when the central-path value is small and less iterations using smooth gradients—until the gradients are too smooth and the optimizer fails to find a swing-up trajectory.
|implicit||hopper 1||hopper 2||hopper 3||push 1||push 2|
|explicit||hopper 1||hopper 2||hopper 3||push 1||push 2|
V-G Comparison with Explicit Bi-Level Optimization
We compare our implicit bi-level approach to an explicit one that is optimized using a direct method with Ipopt [wachter2006implementation]. For the hopper-gait and planar-push examples, we explicitly represent the contact-dynamics optimality conditions as constraints and directly optimize trajectories [manchester2020variational]. We use the same models, cost functions, and comparable optimizer parameters and tolerances. The results are provided in Table II. We find that our implicit approach requires fewer, but more expensive iterations, while the explicit approach requires more, but less expensive iterations. Additionally, the implicit approach typically finds a lower final cost while the explicit approach is able to better satisfy the goal and periodicity constraints.
The complexity of the classic iLQR algorithm is: [tassa2007receding], while a direct method for trajectory optimization that exploits the problem’s temporal structure is: [wang2009fast]. The path-following method generally has complexity: , although specialized solvers can reduce this cost, where is the number of primal and dual variables [Nocedal2006]. The complexity of optimization-based dynamics and iLQR is and is typically dominated by the cost of the path-following method, which is incurred at each time step in the planning horizon. The hopper problems have and the planar push problems have . For the explicit bi-level approach, we augment the control with the additional decision variables, increasing the control dimension at each time step. Then, the hopper problems have and . The planar push problems have and .
In this work we have presented a bi-level optimization framework that utilizes lower-level optimization-based dynamics for trajectory optimization. This representation enables more expressive dynamics by including constraint handling, internal states, and additional physical behavior modeling at the dynamics level, abstracted away from the upper-level optimizer, enabling classically unconstrained solvers to optimize motions for more complex systems.
One of the primary drawbacks to this approach is the lack of convergence guarantees of finding a solution for the dynamics representation. In practice, we find that occasionally the dynamics solver will fail to converge. In this event the optimizer simply returns the current solution and its sensitivities. We find that occasional failures like this often do not impair the overall trajectory-optimization solve and that the optimizer can eventually find a dynamically feasible trajectory. Just as robust, large-scale solvers such as Ipopt [wachter2006implementation]
fallback to their restoration mode when numerical difficulties occur, our basic path-following method is likely to be improved by including such a fallback routine, perhaps specific to a particular system. Additionally, we find that standard techniques such as: problem scaling, appropriate hyperparameter selection, and warm-starting greatly improve the reliability of this approach.
Another potential weakness of this method is overall complexity or wall-clock time for a generic implementation. First, iLQR is a serial algorithm, dominated by forward rollouts and backward Riccati recursions that cannot be parallelized. Similarly, the path-following solver is a serial method dominated by an LU decomposition and solve that is generally not amenable to parallelization either. Second, because an iterative solver is utilized to evaluate the dynamics, calls to the dynamics are inherently more expensive compared to an explicit dynamics representation. However, such overhead can potentially be mitigated with a problem-specific solver that can exploit specialized constraints or sparsity, unlike an off-the-shelf solver’s generic routines.
There are numerous avenues for future work exploring optimization-based dynamics for bi-level trajectory optimization. First, in this work we explore constrained optimization problems solved with a second-order method. Another interesting problem class to consider are energy-based models[lecun2006tutorial], potentially optimized with first-order Langevin dynamics [schlick2010molecular]
. Second, we find that returning gradients optimized with a relaxed central-path parameter greatly improves the convergence behavior of the upper-level optimizer. An analysis of, or method for, returning gradients from the lower-level problem that best aid an upper-level optimizer would be useful. Finally, this bi-level framework could be extended to a tri-level setting where the highest-level optimizer autotunes an objective to generate model-predictive-control policies in an imitation-learning setting.