In numerical optimal control for robotics, often also referred to as trajectory optimization, Differential Dynamic Programming (DDP)  and its variants have recently received increased attention. In contrast to classical trajectory optimization approaches, such as Direct Collocation or Direct Multiple Shooting , where open-loop trajectories are optimized, DDP-style methods are ‘optimal feedback planners’, which design a stabilizing feedback and a feedforward controller at the same time. This family of algorithms solves optimal control problems including a nonlinear cost function and nonlinear system dynamics in an iterative way. Given an initial stabilizing control policy, a local linear approximation of the system dynamics and a linear-quadratic approximation of the cost function is computed. Then, an incremental improvement to the control law is designed and the process repeated until convergence is reached at a local minimum of the cost function. A major reason for the popularity of DDP-style methods is that their computational complexity is often linear in the number of time steps, .
An important DDP variant is the iterative Linear Quadratic Regulator (iLQR) , which is also known as Sequential Linear Quadratic Optimal Control (SLQ) . While DDP is an exact-Hessian method requiring second-order derivatives of the system dynamics (note the exception in ), iLQR requires only first-order derivatives of the dynamics thanks to a Gauss-Newton Hessian approximation. This renders it an attractive choice for a large range of practical control applications, for example nonlinear model predictive control on robotic systems [6, 7, 8]. iLQR has been applied for trajectory optimization of complex underactuated robots 
, and has been combined with Machine Learning.
To date, one of the main shortcomings of the algorithm concerns the treatment of constraints. So far, iLQR can handle bounds on the control input, see , with slight modifications. In , it is shown for the discrete-time case that state-control equality constraints can be incorporated elegantly through elimination of corresponding Lagrange Multipliers from the first-order necessary conditions of optimality. This approach maintains linear time complexity and results in a feedback-law respecting the state-input constraints. In the same work, an approach to treat pure state-constraints is presented, which however requires the solution of a large-scale linear system of equations, rendering the overall algorithm and resulting in a feedback law which is not state-constraint compliant. The continuous-time counterpart for state-input equality constrained iLQR is given in . Additionally, pure-state equality and inequality constraints are incorporated as soft constraints through a penalty method. While the penalty method adds low computational overhead in each iLQR iteration, an increased number of overall iterations may be required due to the gradual ‘tightening’ of the penalty coefficient. Furthermore, the resulting feedback gains are not compliant with the state constraints. Prior work in constrained DDP [14, 15] includes arbitrary nonlinear equality and inequality constraints, but requires solving Quadratic Programs for every time-step and every iteration in order to ensure strict feasibility during the forward integration.
In previously presented iLQR variants, the treatment of pure state constraints as ‘hard constraints’ remained a bottleneck, both because of computational complexity and feedback. This paper contributes an additional perspective on equality-constrained iLQR and introduces a new approach in which the constraints are incorporated using a projection.
This paper presents a state and state-input hard-constrained iLQR variant. The algorithm is based on a projection of the control input onto the nullspace of the linearized constraints. While this ensures the system dynamics to evolve on a subspace tangent to the manifold of nonlinear constraints, the projection leads to a singular control weighting matrix in the linear-quadratic optimal control problem. By applying an existing result from the theory of singular optimal control, we derive a fully constraint-compliant feedforward-feedback control update rule, for which we can solve efficiently with Riccati-style difference equations with linear time complexity. Two simulation examples are given to validate and illustrate the performance of the approach.
This paper is structured as follows. In Section II, the optimal control problem and its linear-quadratic approximation are introduced. In Section III, we establish a way of addressing state and state-input constraints by means of a projection and derive the associated projected optimal control problem. We show a solution approach, derive the corresponding Riccati-like equations and present connections to existing work. Section IV showcases two equality-constrained simulation examples from a robotics context. A discussion and outlook conclude the paper in Section V.
Ii Problem Statement
Consider the following discrete-time, finite-horizon, constrained nonlinear optimal control problem
with state-vectorand control input vector . denotes the intermediate, non-negative cost rate at time-step and the terminal, non-negative cost at time-step . and are the nonlinear state-input and pure state intermediate equality constraints, is the terminal pure state constraint, respectively.
The locally-optimal control law is constructed in an iterative fashion. In each iteration, starting at a constraint-satisfactory initial condition , we obtain nominal state and control input trajectories and through a forward integration of the system using the current policy. Note that, in the first iteration, a stable initial control policy is required. Next, we linearize the system dynamics and constraints and quadratically approximate the cost around the nominal trajectories. Denoting state and control deviations as and gives rise to the following constrained, linear-quadratic optimal control problem
where , , , , and are the coefficients of the Taylor expansion of the cost function (1) around the nominal trajectories, Equation (7) is the linearized system dynamics and Equations (8) to (10) are the linearized constraints. We assume , and . The goal is to solve this constrained linear-quadratic subproblem for a local control policy update and to update the overall control policy accordingly, .
Iii Projected Iterative Linear Quadratic Optimal Control
Iii-a Projecting Constraints
Besides assuming that the problem is not over-constrained, we require Equations (8) to (10) to be well-defined in the following sense. First we assume, without loss of generality, that the state-input constraint matrix has full row rank. If this condition is violated for some , the linearized state-input constraint needs to be factorized into a pure state constraint which can be appended to Equation (9) and a remaining state-input constraint with full rank. Second, we assume that all and are full row rank (otherwise, remove linearly dependent rows). Third, we assume that the relative degree of the constrained system is well-defined and . Consider the matrices , defined as follows
which quantify the propagation of a state-input pair , through the linear system dynamics, into the linear state constraint equation at the next time step. The constrained system given by Equations (7) to (10) is said to have relative degree one at time index , if the matrices and are both full row rank.
In this work, we account for the linearized constraints in the optimal control problem (6)-(10) as follows. By means of a projection, we limit the set of admissible control inputs at time index to such controls which
lead to a satisfaction of the linearized state-input constraints immediately and
lead to satisfaction of the linearized pure state constraints at the next upcoming timestep.
Hence, we rewrite the pure state constraint in terms of a ‘previewed’ state-input constraint
and at time index , the control input is to be selected such that it complies with Equation (13), while also minimizing the cost function (6). To achieve both, we propose the following structure for the control update
with being a projection matrix onto the nullspace of (for readability, we use the short notation in the following). While can be freely selected in order to control the system in the nullspace of the linearized constraints, we dedicate to controlling the system onto this nullspace in the first place. By inserting Equation (14) into (13) and forming an unweighted Moore-Penrose pseudo-inverse, we derive the -optimal choice for
Note that it decomposes into a feedforward term , which can be interpreted as the minimum corrective action to eliminate constraint violations and , and a feedback matrix which governs a possibly deviating state back onto the constraint nullspace.
In the following, we write generally as and the complete control increment correspondingly as
In order to compute the pseudo-inverse within and we perform a matrix factorization of
in terms of an SVD or a QR decomposition at every time step. Note that as a side-product of such a decomposition, we can obtain a set of basis vectors for the nullspace of , required to compute the projection matrix .
Iii-B Reformulation as Singular Optimal Control Problem
where we define the projected weighting matrices for the intermediate cost as
Note that the terminal cost remains unchanged. The intermediate cost remains quadratic in and . remains positive semi-definite. However, the new input cost weighting matrix may become non-negative definite (singular) for any originally positive-definite matrix due to the multiplication with the constraint null-space projector . Hence, in order to compute the optimal control update which minizes the cost-to-go for time , we need to solve a singular optimal control problem.
Iii-C Computing the Optimal Control Increment
Assume a quadratic value function of the form
with weighting matrices , and . From a value function perspective, the optimal control update can be derived by minimizing for a given . To the best of our knowledge, classical iLQR approaches have so far only treated the case of a strictly positive-definite Hessian approximation, equivalent to , which does not directly transfer to the optimal control problem (18)-(19). In this work, however, we exploit the fact that for any discrete-time linear-quadratic optimal control problem, despite its possible singularity, the associated Riccati Difference Equation is well-defined, as long as a certain set of conditions is met . For the projected optimal control problem (18)-(19), these conditions can be interpreted as
For a detailed discussion and derivation of such conditions, the interested reader is referred to the original work by Clements and Anderson in . Since we assume that , it is straight-forward to show that Equations (27) and (28) are satisfied for any symmetric, positive semi-definite weighting matrix . Hence, the singular optimal control problem (18)-(19) can be addressed using the well-established quadratic value-function approach, where the Hessian’s inverse can be replaced by its Moore-Penrose pseudo-inverse.
It is simple to show that, if the value function (26) is quadratic in for time index , it remains quadratic during back-propagation in time, given the affine system dynamics in Equation (19) and the linear quadratic cost in Equation (18). According to Bellman’s Principle of Optimality, the optimal control input at the preceding time index arises from
where we have defined the abbreviations
Note that denotes the unweighted Moore-Penrose pseudo-inverse of the possibly singular Hessian approximation . By plugging the optimal control update (29) into the above expression for we can confirm the quadratic nature of the optimal value function in .
After equating coefficients with a quadratic value function Ansatz of form (26) for , we define and and obtain the following, recursive Riccati-like difference equations for , and
which are valid for all . For the final time-step we obtain the terminal conditions , and .
Iii-D Interpretation and Connection to Existing Results
Examining Equation (33), it becomes obvious that the control update consists of a feedforward and a feedback-term, which are both composed of one part stemming from the computation of and one part from the projected iLQR iteration performed in subspaces tangent to the constraint manifold. At this point, several connections to existing results in the field can be made:
Considering the overall feedback matrix , there is an obvious connection to the work by Hemami et al. in , in which it is shown that the set of all constraint compliant feedback matrices for a pair of linear system matrices is an affine subset of all admissible, linear state-variable feedback gains. In our case, this affine subset is offset from the origin by and spanned by the basis of the constraint nullspace, as indicated by .
Special case 1: when only pure state-constraints of form are considered for a steady setpoint, our iLQR-iteration simplifies to the state-constrained LQR formulation and corresponding Riccati-like equations as presented by Ko et al. in .
Special case 2: when only state-input constraints are considered, our iLQR-iteration simplifies to the result for state-input constraints given by Sideris et al. in .
The presented approach can be interpreted as a variant of hierarchical DDP , with a first task forming the constraints and a second task encoded in the cost.
Iii-E Main Iteration
Algorithm 1 summarizes the main iLQR iteration. The final control update after each Riccati backward pass is obtained through a line search where we select an such that the control policy
minimizes a merit function defined as
The merit function consists of the non-projected cost function, the real positive tuning parameter and the -norm of the equality constraint violation evaluated for every . We consider the algorithm converged when both the relative merit function change and the constraint’s integrated square error (ISE) are below some user-defined threshold.
Iv Simulation Examples
Iv-a Discretizing Continuous-time Rigid Body Dynamics
While the presented constrained iLQR algorithm is formulated in discrete-time, most of the robotics modelling tools available today generate continuous-time ordinary differential equations. In this work, we auto-generate the dynamics and kinematics ODEs using the robotics code generator ‘RobCoGen’, and obtain the continuous-time dynamics derivatives through algorithmic differentiation using our open-source framework [21, 22].
For time discretization, we assume that the control input is interpolated as zero-order hold. Consider the example of a pure state constraint on position level where the continuous-time rigid body dynamics of a robot model equation reads as, with generalized coordinates . In the theoretic case of using a simple explicit Euler integrator both for the forward rollout of the system dynamics and for the discretization of the system matrices and , at least two time-steps would be required until the control input torque became ‘visible’ to the position constraint, in terms of a change of . Correspondingly, the system’s relative degree would be at least two and the projected iLQR algorithm presented in this paper would not be admissible.
Therefore, for the following rigid-body system examples, the forward integration is performed using higher-order Runge-Kutta integrators. We obtain the system matrices , through integrating a sensitivity ODE, which proved to be sufficient for rendering the systems considered in this paper relative degree 1 in various different simulation settings. All simulations were done in C++.
Iv-B Multicopter with Body Position Constraints
In a first example, we apply the algorithm to a multicopter trajectory optimization problem, where the system is modelled similar to . In order to illustrate the algorithm’s capability to deal with complex nonlinear equality constraints, we show a positioning task including position constraints on the multicopter Center of Mass (CoM). With the CoM position being expressed in coordinates, we consider the constraint . The multicopter needs to reach a desired position as closely as possible while respecting the constraint landscape. For a maneuver with time horizon sec, and time discretization ms, the optimization converges in 7 iterations with a total constraint ISE , and CPU time 0.14 sec (Intel Core i7-4850HQ CPU, 2.30 GHz). A plot of the trajectory is given in Fig. 1.
Iv-C 6 DoF Robot Arm with End-Effector Position Constraint
In this example, we showcase a trajectory optimization example for a fixed-base 6 DoF robot arm subject to an end-effector position constraint. The employed arm model is inspired by . Starting at an initial position ‘A’, the task objective encoded in the cost function is to rotate the first joint about 90, while the end-effector path is constrained to lie on a sinusoid curve starting at the initial end-effector pose. The initial guess is a simple inverse-dynamics feedforward for the initial pose, combined with an infinite-horizon LQR feedback controller. Fig. 2 shows snapshots of the obtained trajectory. Fig. 3 shows the average CPU-time for the same problem using different sampling times and indicates linear time-complexity .
V Discussion and Outlook
This paper introduces a state and state-input constrained, projection-based iLQR variant, along with promising simulation results. Essentially, the presented approach is a Single-Shooting Gauss-Newton type method. As such, it is crucial that a stable, initial control policy is provided. Note that this initial control policy is not required to lead to a constraint satisfactory trajectory (in fact, during the course of the simulation experiments, we observed that a wide range of different initial controllers and motions would lead to closely similar final solutions within a similar number of iterations). However, in order to achieve a sufficiently small overall constraint violation, the initial state is strictly required to be constraint compliant. Otherwise, it requires at least one time step with non-zero constraint violation to pass before the system can be steered onto the constraint manifold.
A unique feature of the presented algorithm is that it optimizes a feedforward trajectory, while also providing a control feedback that is guaranteed to be state-constraint compliant. While we have already used the state-input constrained feedback gains in hardware experiments in  as a special case of this work, a verification of the purely-state constrained gains for practical robotics tasks is subject to ongoing work.
In terms of computational complexity, the algorithm is in the number of time-steps . In comparison to traditional unconstrained iLQR, the only computationally expensive operation added at each time step is the factorization of a stacked matrix reflecting state and state-input constraints. It shall be noted that this process bears potential for computational parallelization, as the factorizations can be performed independently at every time step.
The method is free of tuning parameters, except for the cost function weights, which is the same for unconstrained iLQR, and for the merit function weight . In practice, the algorithm converges to the same solution for a broad range of choices for , in all shown simulation examples convergence was achieved with low tuning effort starting from rough guesses for the parameter.
A current limitation is certainly the restriction to systems with relative degree equal to 1. Essentially, the relative degree expresses after how many time steps of system evolution an initially state-input constraint compliant control at time index becomes fully visible to the pure-state constraint. Note that an extension to relative degrees higher than one is possible, but does not easily generalize as it requires a case-specific examination of the linear independence constraint qualification of the constrained optimal control problem. In certain cases, however, it may be possible that the constraints cannot be represented with relative degree one using the simple integration and discretization scheme from Section IV-A, for example when complex actuator dynamics are included in the model. An extension of this work to cases with more complex constraints and dynamics is subject to future work. While initial simulation results look promising, another natural extension of this work is a rigorous comparison in terms of performance, convergence, robustness and computational complexity to reference implementations of established constrained optimization methods.
The authors would like to thank Moritz Diehl, Farbod Farshidian and Michael Neunert for fruitful discussions. This research was supported by the Swiss National Science Foundation through the NCCR Digital Fabrication and a Professorship Award to Jonas Buchli.
-  D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
-  M. Diehl, H. G. Bock, H. Diedam, and P.-B. Wieber, “Fast direct multiple shooting algorithms for optimal robot control,” in Fast motions in biomechanics and robotics, pp. 65–93, Springer, 2006.
-  E. Todorov and W. Li, “A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems,” in American Control Conference, 2005. Proceedings of the 2005, pp. 300–306, IEEE, 2005.
-  A. Sideris and J. E. Bobrow, “An efficient sequential linear quadratic algorithm for solving nonlinear optimal control problems,” Transactions on Automatic Control, vol. 50, no. 12, pp. 2043–2047, 2005.
-  Z. Manchester and S. Kuindersma, “Derivative-free trajectory optimization with unscented dynamic programming,” in IEEE 55th Conference on Decision and Control (CDC), pp. 3642–3647, dec 2016.
-  M. Neunert, C. de Crousaz, F. Furrer, M. Kamel, F. Farshidian, R. Siegwart, and J. Buchli, “Fast nonlinear model predictive control for unified trajectory optimization and tracking,” in IEEE International Conference on Robotics and Automation (ICRA), 2016.
-  M. Giftthaler, F. Farshidian, T. Sandy, L. Stadelmann, and J. Buchli, “Efficient Kinematic Planning for Mobile Manipulators with Non-holonomic Constraints Using Optimal Control,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 3411–3417, May 2017.
-  F. Farshidian, E. Jelavić, A. Satapathy, M. Giftthaler, and J. Buchli, “Real-time motion planning of legged robots: A model predictive control approach,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robots (Humanoids), Nov 2017.
-  M. Neunert, F. Farshidian, A. W. Winkler, and J. Buchli, “Trajectory optimization through contacts and automatic gait discovery for quadrupeds,” IEEE Robotics and Automation Letters (RA-L), 2017.
-  D. Mitrovic, S. Klanke, and S. Vijayakumar, “Adaptive optimal feedback control with learned internal dynamics models,” in From Motor Learning to Interaction Learning in Robots, pp. 65–84, Springer, 2010.
-  Y. Tassa, N. Mansard, and E. Todorov, “Control-limited Differential Dynamic Programming,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 1168–1175, IEEE, 2014.
-  A. Sideris and L. A. Rodriguez, “A Riccati approach for constrained linear quadratic optimal control,” International Journal of Control, vol. 84, no. 2, pp. 370–380, 2011.
-  F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli, “An efficient optimal planning and control framework for quadrupedal locomotion,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
-  T. C. Lin and J. S. Arora, “Differential dynamic programming technique for constrained optimal control,” Computational Mechanics, vol. 9, pp. 27–40, Jan 1991.
-  Z. Xie, C. K. Liu, and K. Hauser, “Differential Dynamic Programming with Nonlinear Constraints,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
-  D. J. Clements and B. D. Anderson, “Singular Optimal Control: The Linear-Quadratic Problem,” in Lecture Notes in Control and Information Science, vol. 5, Springer-Verlag, Berlin, Germany, 1978.
-  H. Hemami and B. Wyman, “Modeling and control of constrained dynamic systems with application to biped locomotion in the frontal plane,” IEEE Transactions on Automatic Control, vol. 24, no. 4, pp. 526–535, 1979.
-  S. Ko and R. R. Bitmead, “Optimal control of linear systems with state equality constraints,” IFAC Proceedings Volumes, vol. 38, no. 1, pp. 407–412, 2005.
-  F. Romano, A. Del Prete, N. Mansard, and F. Nori, “Prioritized optimal control: A hierarchical differential dynamic programming approach,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3590–3595, May 2015.
-  M. Frigerio, J. Buchli, and D. Caldwell, “Code generation of algebraic quantities for robot controllers,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, Oct 2012.
-  M. Giftthaler, M. Neunert, M. Stäuble, and J. Buchli, “The Control Toolbox - An Open-Source C++ Library for Robotics, Optimal and Model Predictive Control.” https://adrlab.bitbucket.io/ct, 2018. arXiv:1801.04290 [cs.RO].
M. Neunert, M. Giftthaler, M. Frigerio, C. Semini, and J. Buchli, “Fast derivatives of rigid body dynamics for control, optimization and estimation,” in2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), San Francisco, pp. 91–97, Dec 2016.
-  B. U. Rehman, Design and Control of a Compact Hydraulic Manipulator for Quadruped Robots. PhD thesis, Istituto Italiano di Tecnologia (IIT) and University of Genova, 2016.