Trajectory Optimization with Optimization-Based Dynamics

We present a framework for bi-level trajectory optimization in which a system's dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimizer. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth forces to be abstracted away from the upper-level optimizer, and allows classical unconstrained optimizers to synthesize trajectories for more complex systems. We provide a path-following method for efficient evaluation of constrained dynamics and utilize the implicit-function theorem to compute smooth gradients of this representation. We demonstrate the framework by modeling systems from locomotion, aerospace, and manipulation domains including: acrobot with joint limits, cart-pole subject to Coulomb friction, Raibert hopper, rocket landing with thrust limits, and planar-push task with optimization-based dynamics and then optimize trajectories using iterative LQR.



There are no comments yet.


page 1


On the Hardware Feasibility of Nonlinear Trajectory Optimization for Legged Locomotion based on a Simplified Dynamics

We propose two feasibility constraints to be included in a Single Rigid ...

Underactuated Waypoint Trajectory Optimization for Light Painting Photography

Despite their abundance in robotics and nature, underactuated systems re...

TRON: A Fast Solver for Trajectory Optimization with Non-Smooth Cost Functions

Trajectory optimization is an important tool for control and planning of...

Regularizing Trajectory Optimization with Denoising Autoencoders

Trajectory optimization with learned dynamics models can often suffer fr...

Time Variable Minimum Torque Trajectory Optimization for Autonomous Excavator

In this paper, we present a minimal torque and time variable trajectory ...

Geometrically Constrained Trajectory Optimization for Multicopters

We present an optimization-based framework for multicopter trajectory pl...

Pushing Fast and Slow: Task-Adaptive MPC for Pushing Manipulation Under Uncertainty

We propose a model predictive control approach to pushing based manipula...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Trajectory optimization is a powerful tool for synthesizing trajectories for nonlinear dynamical systems. Indirect methods, in particular, are able to efficiently find optimal solutions to this class of problem and return dynamically feasible trajectories by performing rollouts and utilizing gradients of the system’s dynamics.

Classically, indirect methods like iterative LQR (iLQR) [jacobson1970differential] utilize explicit dynamics representations, which can be directly evaluated and differentiated in order to return gradients. In this work, we present a more general framework for optimization-based dynamics representations. This latter class enables partial elimination of trajectory-level constraints by absorbing them into the dynamics representation.

Fig. 1: Optimized trajectories for planar-push task. The pusher (blue) and block (orange) paths are shown for a plan that maneuvers the block from a pose at the origin: to a goal pose: .

We formulate optimization-based dynamics as a constrained optimization problem and provide a path-following method for efficient evaluation of the dynamics at each time step. The implicit-function theorem is utilized to differentiate through this representation, and we exploit the properties of our path-following method to return useful, smooth gradients to an upper-level optimizer.

To demonstrate the capabilities of this representation, we utilize optimization-based dynamics and iLQR in a bi-level optimization framework to perform trajectory optimization. We provide a number of optimization-based dynamics models and examples in simulation including: an acrobot with joint limits, a cart-pole experiencing friction, gait generation for a Raibert hopper, a belly-flop soft landing for a rocket subject to thrust limits, and a planar-push manipulation task.

Specifically, our contributions are:

  • A novel framework for optimization-based dynamics that can be used with trajectory-optimization methods that require gradients

  • A path-following method that can efficiently evaluate constrained optimization problems and return smooth gradients

  • A collection of optimization-based dynamics models and examples from locomotion, aerospace, and manipulation domains that demonstrate the proposed bi-level trajectory-optimization framework

In the remainder of this paper, we first review related work on bi-level approaches to trajectory optimization in Section II. Next, we present background on the iLQR algorithm, the implicit-function theorem, and implicit integrators in Section III. Then, we present our optimization-based dynamics representation, as well as a path-following method for solving this problem class in Section IV. We provide a collection of optimization-based dynamics models that are utilized to perform trajectory optimization, and comparisons, in Section V. Finally, we conclude with discussion and directions for future work in Section VI.

Ii Related Work

Bi-level optimization [sinha2017review] is a framework where an upper-level optimizer utilizes the results, i.e., solution and potentially gradients, of a lower-level optimization problem. Approaches typically either implicitly solve the lower-level problem and compute gradients using the solution, or explicitly represent the optimality conditions of the lower-level problem as constraints in the upper-level problem. For example, the MuJoCo simulator [todorov2012mujoco] has been employed in an implicit bi-level approach for whole-body model-predictive control of a humanoid robot [koenemann2015whole]. The lower-level simulator solves a convex optimization problem in order to compute contact forces for rigid-body dynamics, and the results are utilized to perform rollouts and return finite-differenced gradients to the upper-level iLQR optimizer. In contrast, explicit approaches have used the optimality conditions of linear-complementarity-problem contact dynamics as constraints in a direct trajectory-optimization method [posa2014direct]. A related approach formulates contact dynamics as lower-level holonomic constraints [mastalli2020crocoddyl]. Implicit integrators, which are a specific instance of optimization-based dynamics and are widely used in collocation methods [von1993numerical], have been explored with differentiable dynamic programming for models that require cloth simulation [zimmermann2021dynamic]. Implicit dynamics have also been trained to represent non-smooth dynamics [pfrommer2020contactnets], although this work has not been utilized for trajectory optimization. Direct methods with implicit lower-level problems have been used in locomotion applications for tracking reference trajectories with model-predictive control [lecleach2021linear] and a semi-direct method utilizes a lower-level friction problem for planning through contact events [landry2019bilevel]. A related, sequential operator splitting framework is proposed in [sindhwani2017sequential]. In this work we focus on implicit lower-level problems and indirect methods, specifically iLQR, as the upper-level optimizer.

Iii Background

In this section we provide background on the iLQR algorithm, implicit-function theorem, and implicit integrators.

Iii-a Iterative LQR

iLQR is an algorithm for trajectory optimization that solves instances of the following problem:


For a system with state , control inputs , time index , initial state , and discrete-time dynamics , the algorithm aims to minimize an objective with stage-cost functions, , and terminal-cost function, , over a planning horizon .

The algorithm utilizes gradients of the objective and dynamics, and exploits the problem’s temporal structure. The state trajectory is defined implicitly by the initial state and only the control trajectory is parameterized as decision variables. Closed-loop rollouts enable this indirect method to work well in practice. The overall complexity is linear in the planning horizon and cubic in the control-input dimension [tassa2007receding]

While efficient, the algorithm does not natively handle additional general equality or inequality constraints. This limitation has led to many variations of the classic algorithm in order to accommodate constraints in some capacity at the solver level. Box constraints have been incorporated at each step in the backward pass in order to enforce control limits [tassa2014control]. An alternative approach embeds controls in barrier functions which smoothly approximate constraints [marti2020squash]. Augmented Lagrangian methods and constrained backward and forward passes have also been proposed for handling general constraints [howell2019altro, xie2017differential, lantoine2008hybrid].

Iii-B Implicit-Function Theorem

An implicit function, , is defined as


for solutions and problem data .

At an equilibrium point, , the sensitivities of the solution with respect to the problem data, i.e., , can be computed [dini1907lezioni]. First, we approximate (2) to first order:


and then the sensitivity of the solution is computed as:


In the case that is not full rank, an approximate solution to (4), e.g., least-squares, can be computed.

Iii-C Implicit Integrators

Unlike explicit integrators, i.e., , implicit integrators are an implicit function of the next state [brudigam2020linear, manchester2016quaternion], which cannot be separated from the current state and control input:


and are often used for numerical simulation of stiff systems due to improved stability and accuracy compared to explicit integrators, at the expense of increased computation [wanner1996solving].

For direct trajectory-optimization methods that parameterize states and controls and allow for dynamic infeasibility during iterates, such integrators are easily utilized since the state at the next time step is already available as a decision variable to the optimizer [von1993numerical]. However, for indirect methods, like iLQR, that enforce strict dynamic feasibility at each iteration via rollouts, evaluating (5) requires a numerical solver to find a solution that satisfies this implicit function for problem data .

In practice, the next state can be found efficiently and reliably using Newton’s method. Typically, the current state is used to initialize the solver and less than 10 iterations are required to solve the root-finding problem to machine precision. We describe dynamics with implicit integrators as:


Having satisfied (5) to find the next state, we can compute the dynamics Jacobians using the implicit-function theorem. First, the dynamics are approximated to first order:


and then we solve for :


The Jacobians:


are returned at each time step.

Iv Optimization-Based Dynamics

In the previous section, we presented dynamics with implicit integrators that can be evaluated during rollouts and differentiated. In this section, we present a more general representation: optimization-based dynamics, which solve a constrained optimization problem in order to evaluate dynamics and use the implicit-function theorem to compute gradients by differentiating through the problem’s optimality conditions.

Iv-a Problem Formulation

For dynamics we require: fast and reliable evaluation, gradients that are useful to an upper-level optimizer, and (ideally) tight constraint satisfaction. We consider problems of the form:


with decision variable , problem data , objective , equality constraints , and cone . Typically, the problem data include the current state and control, , and the next state belongs to the optimal solution set.

Iv-B Path-Following Method

One of the primary challenges in optimizing (11) is selecting a solver that is well-suited to the requirements imposed by dynamics representations. Solvers for this class of problem are generally categorized as first-order projection [stellato2020osqp, o2016conic, garstka2019cosmo] or second-order path-following methods [domahidi2013ecos, vandenberghe2010cvxopt]. The first approach optimizes (11) by splitting the problem and alternating between inexpensive first-order methods, and potentially non-smooth, projections onto the cone. The second approach formulates and optimizes barrier subproblems, using second-order methods, along a central path, eventually converging to the cone’s boundary in the limit [boyd2004convex]. Second-order semi-smooth methods also exist [ali2017semismooth].

The first-order projection-based methods, while fast, often can only achieve coarse constraint satisfaction and, importantly, the gradients returned are usually subgradients, which are less useful to an optimizer. In contrast, path-following methods exhibit fast convergence, can achieve tight constraint satisfaction [Nocedal2006], and can return smooth gradients using a relaxed central-path parameter.

Based on these characteristics, we utilize a path-following method [Nocedal2006] to optimize (11). The optimality conditions for a barrier subproblem are:


with duals and , cone product denoted with the operator, central-path parameter , and dual cone [domahidi2013ecos]. We consider free: ; nonnegative: ; second-order:


and the Cartesian product of these cones. For nonnegative cones, the cone product is an element-wise product and . For second-order cones, the cone product is:


and [domahidi2013ecos]. The nonnegative and second-order cones are self dual, i.e., , while the dual of the free cone is the zero cone, i.e., .

To find a stationary point of (12-15), for a fixed central-path parameter , we consider and a residual comprising (12-14). A search direction, , is computed using the residual and its Jacobian with respect to the decision variables at the current point. A backtracking line search is performed to ensure that a candidate step respects the cone constraints and reduces the norm of the residual. Once the residual norm is below a desired tolerance, we cache the current solution , the central-path parameter is decreased, and the procedure is repeated until the central-path parameter is below a desired tolerance.

1:procedure PathFollowing()
2:     Parameters: ,
4:     Initialize:
6:     Until do
9:     Until , do
12:     Until do
17:     If do
20:      Eq. 4
21:     Return
22:end procedure
Algorithm 1 Differentiable Path-Following Method

To differentiate (11), we apply the implicit-function theorem (4) to the residual at an intermediate solution, . In practice, we find that performing implicit differentiation with a solution point having a relaxed central-path parameter returns smooth gradients that improve the convergence behavior of the upper-level optimizer. The complete algorithm is summarized in Algorithm 1.

V Examples

We formulate optimization-based dynamics models and use iLQR to perform bi-level trajectory optimization for a number of examples that highlight how these representations can be constructed, demonstrate that non-smooth and constrained dynamics can be optimized, provide a comparison of smooth gradients, and compare implicit and explicit bi-level approaches.

Throughout, we use implicit midpoint integrators, quadratic costs, and for convenience, employ an augmented Lagrangian method to enforce any remaining trajectory-level constraints (e.g., terminal constraints), not handled implicitly by the dynamics representation. Our implementation and all of the experiments are available here:

V-a Acrobot with Joint Limits

Fig. 2: Optimized trajectories for acrobot systems without (left) and with (right) joint limits performing a swing-up. The systems are visualized at second increments. Red and green elbow colors indicate violation or non-violation of the joint limits, respectively.

We model an acrobot [tedrake2014underactuated] with joint limits on the actuated elbow. These limits are enforced with a signed-distance constraint,


where is the system’s configuration and is the elbow angle. Additional constraints,


enforce physical behaviors that impact impulses can only be non-negative, i.e., repulsive not attractive, and that they are only applied to the system when the joint reaches a limit. Relaxing the complementarity constraint via a central-path parameter, introducing a slack variable for the signed-distance function , and combining this reformulation with the system’s dynamics results in a problem formulation (12-15) that can be optimized with Algorithm 1.

Unlike approaches that include joint limits at the solver level, including the impact (19) and limit (18) constraints at the dynamics level enables impact forces encountered at joint stops to be optimized and applied to the system.

The system has states and controls. We plan a swing-up trajectory over a horizon with a time step . The optimizer is initialized with random controls.

We compare unconstrained and joint-limited systems, the optimized motions are shown in Fig. 2. The unconstrained system violates joint limits, while the joint-limited system reaches the limits without exceeding them and finds a motion utilizing additional pumps.

V-B Cart-Pole with Coulomb Friction

Fig. 3: Optimized trajectories for cart-pole performing a swing-up. The trajectories becoming increasingly aggressive in order to achieve a swing-up as friction is increased (top to bottom).

A cart-pole [tedrake2014underactuated] is modeled that experiences Coulomb friction [moreau2011unilateral] on both its slider and pendulum arm. The friction force that maximally dissipates kinetic energy is the solution to the following optimization problem:


where is the joint velocity, is the friction force, and is the friction-cone limit [lobo1998applications]. The optimality conditions for the cone program’s barrier subproblem are:


with , such that , and dual variables and for the equality and cone constraints, respectively.

The system has states, controls, and dimension . We make the modeling assumption that , which is a function of the friction coefficient, is fixed for both joints. We plan a swing-up trajectory for a horizon and time step . The optimizer is initialized with an impulse at the first time step and zeros for the remaining initial control inputs.

We compare the trajectories optimized for systems with increasing amounts of friction, i.e., increasing , in Fig. 3. We observe that the system must perform more aggressive maneuvers in order to succeed at the swing-up task in the presence of increasing friction.

V-C Hopper Gait

Fig. 4: Hopper gaits. Optimized templates (red) for the system’s body (orange) and foot (blue) trajectories. The top trajectory finds a single-hop gait, the middle finds a two-hop gait, and the bottom finds a two-hop gait that bounces back before hopping forward.

We generate gaits for a Raibert hopper [raibert1989dynamically]. The system’s dynamics are modeled as a nonlinear complementarity problem [lecleach2021linear]. The 2D system has four generalized coordinates: lateral and vertical body positions, body orientation, and leg length. There are

control inputs: a body moment and leg force. The state is

, comprising the system’s current and previous configurations. The horizon is with time step . The initial state is optimized and a trajectory-level periodicity constraint is employed to ensure that the first and final states, with the exception of the lateral positions, are equivalent. Finally, we initialize the solver with controls that maintain the system in an upright configuration.

By tuning the cost functions we generate qualitatively different gaits. The optimizer finds single-hop and multi-hop trajectories that satisfy the periodicity constraint. We repeat the optimized template to form a gait, three are shown in Fig. 4.

V-D Rocket Landing

We plan a soft-landing for a rocket with 6 degrees of freedom that must transition from a horizontal to vertical orientation prior to landing while respecting thrust constraints. Unlike prior work that enforces such constraints at the optimizer level

[blackmore2010minimum], we instead enforce these constraints at the dynamics level. This enables the optimizer to provide smooth controls to the dynamics, which then internally constrain the input to satisfy the system’s limits at every iteration.

The cone projection problem, which finds the thrust vector that is closest to the optimizer’s input thrust while satisfying constraints, is formulated as:


where are the optimized and provided thrust vectors, respectively, is the component of thrust along the longitudinal axis of the rocket, and are limits on this value, and is a scaling parameter.

The system has states and controls that are first projected using (25) before being applied to dynamics. The planning horizon is and time step is . The controls are initialized with small random values, the initial pose is offset from the target, and the rocket has initial downward velocity. A constraint enforces the rocket’s final pose, a zero velocity, vertical-orientation configuration in a landing zone. The scaling parameter is set to one.

The optimizer finds a plan that successfully reorients the rocket prior to landing while respecting the thrust constraints, which would otherwise be violated at the first time steps without the thrust projection. The normalized optimized controls and position trajectory are provided in Fig. 5.

V-E Planar Push

For the canonical manipulation problem of planar pushing [hogan2016feedback], we optimize the positions and forces of a robotic manipulator’s circular end-effector in order to slide a planar bock into an oriented goal pose (Fig. 1). The system’s end-effector is modeled as a fully actuated particle in 2D that can move the block (with 2D translation and orientation) via impulses and friction forces while the two systems are in contact. The block is modeled with point friction at each of its four corners and a single signed-distance function is employed that enables the end-effector to impact and push on any of the block’s sides.

The system has states, controls, and forces are optimized. The planning horizon is and the timestep is . We initialize the end-effector with a control input that overcomes stiction to move the block. The block is initialized at the origin with the pusher in contact on its left side, and the block is constrained at the trajectory-level to reach a goal pose.

failure 128 9 206 10 582 133 630 68
TABLE I: Comparison of smooth gradients, parameterized by central-path parameter and tested over five random seeds for random initial control inputs, on the number of iLQR iterations (mean standard deviation) required for acrobot swing-up problem.

V-F Comparison of Smooth Gradients

We compare the effect of smooth gradients on the upper-level optimizer by varying the value of the central-path parameter used by the implicit-function theorem to compute the gradients of dynamics. The results for the acrobot swingup problem, provided in Table I, are computed over five trials of random initial control inputs for each central-path value. We find that iLQR requires more iterations when the central-path value is small and less iterations using smooth gradients—until the gradients are too smooth and the optimizer fails to find a swing-up trajectory.

implicit hopper 1 hopper 2 hopper 3 push 1 push 2
iter. 28 42 30 18 19
cost -29.5 -192.1 -34.4 -32.6 -104.2
con. 1.5e-4 1.5e-4 2.3e-4 9.2e-4 1.4e-4
explicit hopper 1 hopper 2 hopper 3 push 1 push 2
iter. 81 69 394 57 133
cost -15.98 226.9 -22.5 -33.0 -104.1
con. 5.5e-9 5.5e-9 2.3e-6 3.0e-10 1.8e-8
TABLE II: Comparison between implicit and explicit bi-level approaches for hopper-gait and planar-push examples. The number of iterations, final cost, and maximum constraint violation are compared between: (ours) implicit, optimization-based dynamics with iLQR, and explicit, a direct method using Ipopt.

V-G Comparison with Explicit Bi-Level Optimization

We compare our implicit bi-level approach to an explicit one that is optimized using a direct method with Ipopt [wachter2006implementation]. For the hopper-gait and planar-push examples, we explicitly represent the contact-dynamics optimality conditions as constraints and directly optimize trajectories [manchester2020variational]. We use the same models, cost functions, and comparable optimizer parameters and tolerances. The results are provided in Table II. We find that our implicit approach requires fewer, but more expensive iterations, while the explicit approach requires more, but less expensive iterations. Additionally, the implicit approach typically finds a lower final cost while the explicit approach is able to better satisfy the goal and periodicity constraints.

The complexity of the classic iLQR algorithm is: [tassa2007receding], while a direct method for trajectory optimization that exploits the problem’s temporal structure is: [wang2009fast]. The path-following method generally has complexity: , although specialized solvers can reduce this cost, where is the number of primal and dual variables [Nocedal2006]. The complexity of optimization-based dynamics and iLQR is and is typically dominated by the cost of the path-following method, which is incurred at each time step in the planning horizon. The hopper problems have and the planar push problems have . For the explicit bi-level approach, we augment the control with the additional decision variables, increasing the control dimension at each time step. Then, the hopper problems have and . The planar push problems have and .

Vi Discussion

Fig. 5: Normalized optimized controls (left) and position trajectory (right) for rocket performing soft landing. The system first reorients from a horizontal to vertical pose before landing with zero velocity in a target zone while respecting thrust constraints.

In this work we have presented a bi-level optimization framework that utilizes lower-level optimization-based dynamics for trajectory optimization. This representation enables more expressive dynamics by including constraint handling, internal states, and additional physical behavior modeling at the dynamics level, abstracted away from the upper-level optimizer, enabling classically unconstrained solvers to optimize motions for more complex systems.

One of the primary drawbacks to this approach is the lack of convergence guarantees of finding a solution for the dynamics representation. In practice, we find that occasionally the dynamics solver will fail to converge. In this event the optimizer simply returns the current solution and its sensitivities. We find that occasional failures like this often do not impair the overall trajectory-optimization solve and that the optimizer can eventually find a dynamically feasible trajectory. Just as robust, large-scale solvers such as Ipopt [wachter2006implementation]

fallback to their restoration mode when numerical difficulties occur, our basic path-following method is likely to be improved by including such a fallback routine, perhaps specific to a particular system. Additionally, we find that standard techniques such as: problem scaling, appropriate hyperparameter selection, and warm-starting greatly improve the reliability of this approach.

Another potential weakness of this method is overall complexity or wall-clock time for a generic implementation. First, iLQR is a serial algorithm, dominated by forward rollouts and backward Riccati recursions that cannot be parallelized. Similarly, the path-following solver is a serial method dominated by an LU decomposition and solve that is generally not amenable to parallelization either. Second, because an iterative solver is utilized to evaluate the dynamics, calls to the dynamics are inherently more expensive compared to an explicit dynamics representation. However, such overhead can potentially be mitigated with a problem-specific solver that can exploit specialized constraints or sparsity, unlike an off-the-shelf solver’s generic routines.

There are numerous avenues for future work exploring optimization-based dynamics for bi-level trajectory optimization. First, in this work we explore constrained optimization problems solved with a second-order method. Another interesting problem class to consider are energy-based models

[lecun2006tutorial], potentially optimized with first-order Langevin dynamics [schlick2010molecular]

. Second, we find that returning gradients optimized with a relaxed central-path parameter greatly improves the convergence behavior of the upper-level optimizer. An analysis of, or method for, returning gradients from the lower-level problem that best aid an upper-level optimizer would be useful. Finally, this bi-level framework could be extended to a tri-level setting where the highest-level optimizer autotunes an objective to generate model-predictive-control policies in an imitation-learning setting.