I Introduction
Trajectory optimization is a powerful tool for synthesizing trajectories for nonlinear dynamical systems. Indirect methods, in particular, are able to efficiently find optimal solutions to this class of problem and return dynamically feasible trajectories by performing rollouts and utilizing gradients of the system’s dynamics.
Classically, indirect methods like iterative LQR (iLQR) [jacobson1970differential] utilize explicit dynamics representations, which can be directly evaluated and differentiated in order to return gradients. In this work, we present a more general framework for optimizationbased dynamics representations. This latter class enables partial elimination of trajectorylevel constraints by absorbing them into the dynamics representation.
We formulate optimizationbased dynamics as a constrained optimization problem and provide a pathfollowing method for efficient evaluation of the dynamics at each time step. The implicitfunction theorem is utilized to differentiate through this representation, and we exploit the properties of our pathfollowing method to return useful, smooth gradients to an upperlevel optimizer.
To demonstrate the capabilities of this representation, we utilize optimizationbased dynamics and iLQR in a bilevel optimization framework to perform trajectory optimization. We provide a number of optimizationbased dynamics models and examples in simulation including: an acrobot with joint limits, a cartpole experiencing friction, gait generation for a Raibert hopper, a bellyflop soft landing for a rocket subject to thrust limits, and a planarpush manipulation task.
Specifically, our contributions are:

A novel framework for optimizationbased dynamics that can be used with trajectoryoptimization methods that require gradients

A pathfollowing method that can efficiently evaluate constrained optimization problems and return smooth gradients

A collection of optimizationbased dynamics models and examples from locomotion, aerospace, and manipulation domains that demonstrate the proposed bilevel trajectoryoptimization framework
In the remainder of this paper, we first review related work on bilevel approaches to trajectory optimization in Section II. Next, we present background on the iLQR algorithm, the implicitfunction theorem, and implicit integrators in Section III. Then, we present our optimizationbased dynamics representation, as well as a pathfollowing method for solving this problem class in Section IV. We provide a collection of optimizationbased dynamics models that are utilized to perform trajectory optimization, and comparisons, in Section V. Finally, we conclude with discussion and directions for future work in Section VI.
Ii Related Work
Bilevel optimization [sinha2017review] is a framework where an upperlevel optimizer utilizes the results, i.e., solution and potentially gradients, of a lowerlevel optimization problem. Approaches typically either implicitly solve the lowerlevel problem and compute gradients using the solution, or explicitly represent the optimality conditions of the lowerlevel problem as constraints in the upperlevel problem. For example, the MuJoCo simulator [todorov2012mujoco] has been employed in an implicit bilevel approach for wholebody modelpredictive control of a humanoid robot [koenemann2015whole]. The lowerlevel simulator solves a convex optimization problem in order to compute contact forces for rigidbody dynamics, and the results are utilized to perform rollouts and return finitedifferenced gradients to the upperlevel iLQR optimizer. In contrast, explicit approaches have used the optimality conditions of linearcomplementarityproblem contact dynamics as constraints in a direct trajectoryoptimization method [posa2014direct]. A related approach formulates contact dynamics as lowerlevel holonomic constraints [mastalli2020crocoddyl]. Implicit integrators, which are a specific instance of optimizationbased dynamics and are widely used in collocation methods [von1993numerical], have been explored with differentiable dynamic programming for models that require cloth simulation [zimmermann2021dynamic]. Implicit dynamics have also been trained to represent nonsmooth dynamics [pfrommer2020contactnets], although this work has not been utilized for trajectory optimization. Direct methods with implicit lowerlevel problems have been used in locomotion applications for tracking reference trajectories with modelpredictive control [lecleach2021linear] and a semidirect method utilizes a lowerlevel friction problem for planning through contact events [landry2019bilevel]. A related, sequential operator splitting framework is proposed in [sindhwani2017sequential]. In this work we focus on implicit lowerlevel problems and indirect methods, specifically iLQR, as the upperlevel optimizer.
Iii Background
In this section we provide background on the iLQR algorithm, implicitfunction theorem, and implicit integrators.
Iiia Iterative LQR
iLQR is an algorithm for trajectory optimization that solves instances of the following problem:
(1) 
For a system with state , control inputs , time index , initial state , and discretetime dynamics , the algorithm aims to minimize an objective with stagecost functions, , and terminalcost function, , over a planning horizon .
The algorithm utilizes gradients of the objective and dynamics, and exploits the problem’s temporal structure. The state trajectory is defined implicitly by the initial state and only the control trajectory is parameterized as decision variables. Closedloop rollouts enable this indirect method to work well in practice. The overall complexity is linear in the planning horizon and cubic in the controlinput dimension [tassa2007receding]
While efficient, the algorithm does not natively handle additional general equality or inequality constraints. This limitation has led to many variations of the classic algorithm in order to accommodate constraints in some capacity at the solver level. Box constraints have been incorporated at each step in the backward pass in order to enforce control limits [tassa2014control]. An alternative approach embeds controls in barrier functions which smoothly approximate constraints [marti2020squash]. Augmented Lagrangian methods and constrained backward and forward passes have also been proposed for handling general constraints [howell2019altro, xie2017differential, lantoine2008hybrid].
IiiB ImplicitFunction Theorem
An implicit function, , is defined as
(2) 
for solutions and problem data .
At an equilibrium point, , the sensitivities of the solution with respect to the problem data, i.e., , can be computed [dini1907lezioni]. First, we approximate (2) to first order:
(3) 
and then the sensitivity of the solution is computed as:
(4) 
In the case that is not full rank, an approximate solution to (4), e.g., leastsquares, can be computed.
IiiC Implicit Integrators
Unlike explicit integrators, i.e., , implicit integrators are an implicit function of the next state [brudigam2020linear, manchester2016quaternion], which cannot be separated from the current state and control input:
(5) 
and are often used for numerical simulation of stiff systems due to improved stability and accuracy compared to explicit integrators, at the expense of increased computation [wanner1996solving].
For direct trajectoryoptimization methods that parameterize states and controls and allow for dynamic infeasibility during iterates, such integrators are easily utilized since the state at the next time step is already available as a decision variable to the optimizer [von1993numerical]. However, for indirect methods, like iLQR, that enforce strict dynamic feasibility at each iteration via rollouts, evaluating (5) requires a numerical solver to find a solution that satisfies this implicit function for problem data .
In practice, the next state can be found efficiently and reliably using Newton’s method. Typically, the current state is used to initialize the solver and less than 10 iterations are required to solve the rootfinding problem to machine precision. We describe dynamics with implicit integrators as:
(6) 
Having satisfied (5) to find the next state, we can compute the dynamics Jacobians using the implicitfunction theorem. First, the dynamics are approximated to first order:
(7) 
and then we solve for :
(8) 
The Jacobians:
(9)  
(10) 
are returned at each time step.
Iv OptimizationBased Dynamics
In the previous section, we presented dynamics with implicit integrators that can be evaluated during rollouts and differentiated. In this section, we present a more general representation: optimizationbased dynamics, which solve a constrained optimization problem in order to evaluate dynamics and use the implicitfunction theorem to compute gradients by differentiating through the problem’s optimality conditions.
Iva Problem Formulation
For dynamics we require: fast and reliable evaluation, gradients that are useful to an upperlevel optimizer, and (ideally) tight constraint satisfaction. We consider problems of the form:
(11) 
with decision variable , problem data , objective , equality constraints , and cone . Typically, the problem data include the current state and control, , and the next state belongs to the optimal solution set.
IvB PathFollowing Method
One of the primary challenges in optimizing (11) is selecting a solver that is wellsuited to the requirements imposed by dynamics representations. Solvers for this class of problem are generally categorized as firstorder projection [stellato2020osqp, o2016conic, garstka2019cosmo] or secondorder pathfollowing methods [domahidi2013ecos, vandenberghe2010cvxopt]. The first approach optimizes (11) by splitting the problem and alternating between inexpensive firstorder methods, and potentially nonsmooth, projections onto the cone. The second approach formulates and optimizes barrier subproblems, using secondorder methods, along a central path, eventually converging to the cone’s boundary in the limit [boyd2004convex]. Secondorder semismooth methods also exist [ali2017semismooth].
The firstorder projectionbased methods, while fast, often can only achieve coarse constraint satisfaction and, importantly, the gradients returned are usually subgradients, which are less useful to an optimizer. In contrast, pathfollowing methods exhibit fast convergence, can achieve tight constraint satisfaction [Nocedal2006], and can return smooth gradients using a relaxed centralpath parameter.
Based on these characteristics, we utilize a pathfollowing method [Nocedal2006] to optimize (11). The optimality conditions for a barrier subproblem are:
(12)  
(13)  
(14)  
(15) 
with duals and , cone product denoted with the operator, centralpath parameter , and dual cone [domahidi2013ecos]. We consider free: ; nonnegative: ; secondorder:
(16) 
and the Cartesian product of these cones. For nonnegative cones, the cone product is an elementwise product and . For secondorder cones, the cone product is:
(17) 
and [domahidi2013ecos]. The nonnegative and secondorder cones are self dual, i.e., , while the dual of the free cone is the zero cone, i.e., .
To find a stationary point of (1215), for a fixed centralpath parameter , we consider and a residual comprising (1214). A search direction, , is computed using the residual and its Jacobian with respect to the decision variables at the current point. A backtracking line search is performed to ensure that a candidate step respects the cone constraints and reduces the norm of the residual. Once the residual norm is below a desired tolerance, we cache the current solution , the centralpath parameter is decreased, and the procedure is repeated until the centralpath parameter is below a desired tolerance.
To differentiate (11), we apply the implicitfunction theorem (4) to the residual at an intermediate solution, . In practice, we find that performing implicit differentiation with a solution point having a relaxed centralpath parameter returns smooth gradients that improve the convergence behavior of the upperlevel optimizer. The complete algorithm is summarized in Algorithm 1.
V Examples
We formulate optimizationbased dynamics models and use iLQR to perform bilevel trajectory optimization for a number of examples that highlight how these representations can be constructed, demonstrate that nonsmooth and constrained dynamics can be optimized, provide a comparison of smooth gradients, and compare implicit and explicit bilevel approaches.
Throughout, we use implicit midpoint integrators, quadratic costs, and for convenience, employ an augmented Lagrangian method to enforce any remaining trajectorylevel constraints (e.g., terminal constraints), not handled implicitly by the dynamics representation. Our implementation and all of the experiments are available here: https://github.com/thowell/motion_planning/examples/implicit_dynamics.
Va Acrobot with Joint Limits
We model an acrobot [tedrake2014underactuated] with joint limits on the actuated elbow. These limits are enforced with a signeddistance constraint,
(18) 
where is the system’s configuration and is the elbow angle. Additional constraints,
(19) 
enforce physical behaviors that impact impulses can only be nonnegative, i.e., repulsive not attractive, and that they are only applied to the system when the joint reaches a limit. Relaxing the complementarity constraint via a centralpath parameter, introducing a slack variable for the signeddistance function , and combining this reformulation with the system’s dynamics results in a problem formulation (1215) that can be optimized with Algorithm 1.
Unlike approaches that include joint limits at the solver level, including the impact (19) and limit (18) constraints at the dynamics level enables impact forces encountered at joint stops to be optimized and applied to the system.
The system has states and controls. We plan a swingup trajectory over a horizon with a time step . The optimizer is initialized with random controls.
We compare unconstrained and jointlimited systems, the optimized motions are shown in Fig. 2. The unconstrained system violates joint limits, while the jointlimited system reaches the limits without exceeding them and finds a motion utilizing additional pumps.
VB CartPole with Coulomb Friction
A cartpole [tedrake2014underactuated] is modeled that experiences Coulomb friction [moreau2011unilateral] on both its slider and pendulum arm. The friction force that maximally dissipates kinetic energy is the solution to the following optimization problem:
(20) 
where is the joint velocity, is the friction force, and is the frictioncone limit [lobo1998applications]. The optimality conditions for the cone program’s barrier subproblem are:
(21)  
(22)  
(23)  
(24) 
with , such that , and dual variables and for the equality and cone constraints, respectively.
The system has states, controls, and dimension . We make the modeling assumption that , which is a function of the friction coefficient, is fixed for both joints. We plan a swingup trajectory for a horizon and time step . The optimizer is initialized with an impulse at the first time step and zeros for the remaining initial control inputs.
We compare the trajectories optimized for systems with increasing amounts of friction, i.e., increasing , in Fig. 3. We observe that the system must perform more aggressive maneuvers in order to succeed at the swingup task in the presence of increasing friction.
VC Hopper Gait
We generate gaits for a Raibert hopper [raibert1989dynamically]. The system’s dynamics are modeled as a nonlinear complementarity problem [lecleach2021linear]. The 2D system has four generalized coordinates: lateral and vertical body positions, body orientation, and leg length. There are
control inputs: a body moment and leg force. The state is
, comprising the system’s current and previous configurations. The horizon is with time step . The initial state is optimized and a trajectorylevel periodicity constraint is employed to ensure that the first and final states, with the exception of the lateral positions, are equivalent. Finally, we initialize the solver with controls that maintain the system in an upright configuration.By tuning the cost functions we generate qualitatively different gaits. The optimizer finds singlehop and multihop trajectories that satisfy the periodicity constraint. We repeat the optimized template to form a gait, three are shown in Fig. 4.
VD Rocket Landing
We plan a softlanding for a rocket with 6 degrees of freedom that must transition from a horizontal to vertical orientation prior to landing while respecting thrust constraints. Unlike prior work that enforces such constraints at the optimizer level
[blackmore2010minimum], we instead enforce these constraints at the dynamics level. This enables the optimizer to provide smooth controls to the dynamics, which then internally constrain the input to satisfy the system’s limits at every iteration.The cone projection problem, which finds the thrust vector that is closest to the optimizer’s input thrust while satisfying constraints, is formulated as:
(25) 
where are the optimized and provided thrust vectors, respectively, is the component of thrust along the longitudinal axis of the rocket, and are limits on this value, and is a scaling parameter.
The system has states and controls that are first projected using (25) before being applied to dynamics. The planning horizon is and time step is . The controls are initialized with small random values, the initial pose is offset from the target, and the rocket has initial downward velocity. A constraint enforces the rocket’s final pose, a zero velocity, verticalorientation configuration in a landing zone. The scaling parameter is set to one.
The optimizer finds a plan that successfully reorients the rocket prior to landing while respecting the thrust constraints, which would otherwise be violated at the first time steps without the thrust projection. The normalized optimized controls and position trajectory are provided in Fig. 5.
VE Planar Push
For the canonical manipulation problem of planar pushing [hogan2016feedback], we optimize the positions and forces of a robotic manipulator’s circular endeffector in order to slide a planar bock into an oriented goal pose (Fig. 1). The system’s endeffector is modeled as a fully actuated particle in 2D that can move the block (with 2D translation and orientation) via impulses and friction forces while the two systems are in contact. The block is modeled with point friction at each of its four corners and a single signeddistance function is employed that enables the endeffector to impact and push on any of the block’s sides.
The system has states, controls, and forces are optimized. The planning horizon is and the timestep is . We initialize the endeffector with a control input that overcomes stiction to move the block. The block is initialized at the origin with the pusher in contact on its left side, and the block is constrained at the trajectorylevel to reach a goal pose.
failure  128 9  206 10  582 133  630 68 

VF Comparison of Smooth Gradients
We compare the effect of smooth gradients on the upperlevel optimizer by varying the value of the centralpath parameter used by the implicitfunction theorem to compute the gradients of dynamics. The results for the acrobot swingup problem, provided in Table I, are computed over five trials of random initial control inputs for each centralpath value. We find that iLQR requires more iterations when the centralpath value is small and less iterations using smooth gradients—until the gradients are too smooth and the optimizer fails to find a swingup trajectory.
implicit  hopper 1  hopper 2  hopper 3  push 1  push 2 

iter.  28  42  30  18  19 
cost  29.5  192.1  34.4  32.6  104.2 
con.  1.5e4  1.5e4  2.3e4  9.2e4  1.4e4 
explicit  hopper 1  hopper 2  hopper 3  push 1  push 2 
iter.  81  69  394  57  133 
cost  15.98  226.9  22.5  33.0  104.1 
con.  5.5e9  5.5e9  2.3e6  3.0e10  1.8e8 
VG Comparison with Explicit BiLevel Optimization
We compare our implicit bilevel approach to an explicit one that is optimized using a direct method with Ipopt [wachter2006implementation]. For the hoppergait and planarpush examples, we explicitly represent the contactdynamics optimality conditions as constraints and directly optimize trajectories [manchester2020variational]. We use the same models, cost functions, and comparable optimizer parameters and tolerances. The results are provided in Table II. We find that our implicit approach requires fewer, but more expensive iterations, while the explicit approach requires more, but less expensive iterations. Additionally, the implicit approach typically finds a lower final cost while the explicit approach is able to better satisfy the goal and periodicity constraints.
The complexity of the classic iLQR algorithm is: [tassa2007receding], while a direct method for trajectory optimization that exploits the problem’s temporal structure is: [wang2009fast]. The pathfollowing method generally has complexity: , although specialized solvers can reduce this cost, where is the number of primal and dual variables [Nocedal2006]. The complexity of optimizationbased dynamics and iLQR is and is typically dominated by the cost of the pathfollowing method, which is incurred at each time step in the planning horizon. The hopper problems have and the planar push problems have . For the explicit bilevel approach, we augment the control with the additional decision variables, increasing the control dimension at each time step. Then, the hopper problems have and . The planar push problems have and .
Vi Discussion
In this work we have presented a bilevel optimization framework that utilizes lowerlevel optimizationbased dynamics for trajectory optimization. This representation enables more expressive dynamics by including constraint handling, internal states, and additional physical behavior modeling at the dynamics level, abstracted away from the upperlevel optimizer, enabling classically unconstrained solvers to optimize motions for more complex systems.
One of the primary drawbacks to this approach is the lack of convergence guarantees of finding a solution for the dynamics representation. In practice, we find that occasionally the dynamics solver will fail to converge. In this event the optimizer simply returns the current solution and its sensitivities. We find that occasional failures like this often do not impair the overall trajectoryoptimization solve and that the optimizer can eventually find a dynamically feasible trajectory. Just as robust, largescale solvers such as Ipopt [wachter2006implementation]
fallback to their restoration mode when numerical difficulties occur, our basic pathfollowing method is likely to be improved by including such a fallback routine, perhaps specific to a particular system. Additionally, we find that standard techniques such as: problem scaling, appropriate hyperparameter selection, and warmstarting greatly improve the reliability of this approach.
Another potential weakness of this method is overall complexity or wallclock time for a generic implementation. First, iLQR is a serial algorithm, dominated by forward rollouts and backward Riccati recursions that cannot be parallelized. Similarly, the pathfollowing solver is a serial method dominated by an LU decomposition and solve that is generally not amenable to parallelization either. Second, because an iterative solver is utilized to evaluate the dynamics, calls to the dynamics are inherently more expensive compared to an explicit dynamics representation. However, such overhead can potentially be mitigated with a problemspecific solver that can exploit specialized constraints or sparsity, unlike an offtheshelf solver’s generic routines.
There are numerous avenues for future work exploring optimizationbased dynamics for bilevel trajectory optimization. First, in this work we explore constrained optimization problems solved with a secondorder method. Another interesting problem class to consider are energybased models
[lecun2006tutorial], potentially optimized with firstorder Langevin dynamics [schlick2010molecular]. Second, we find that returning gradients optimized with a relaxed centralpath parameter greatly improves the convergence behavior of the upperlevel optimizer. An analysis of, or method for, returning gradients from the lowerlevel problem that best aid an upperlevel optimizer would be useful. Finally, this bilevel framework could be extended to a trilevel setting where the highestlevel optimizer autotunes an objective to generate modelpredictivecontrol policies in an imitationlearning setting.
Comments
There are no comments yet.