I Introduction
Ia Motivation
In numerical optimal control for robotics, often also referred to as trajectory optimization, Differential Dynamic Programming (DDP) [1] and its variants have recently received increased attention. In contrast to classical trajectory optimization approaches, such as Direct Collocation or Direct Multiple Shooting [2], where openloop trajectories are optimized, DDPstyle methods are ‘optimal feedback planners’, which design a stabilizing feedback and a feedforward controller at the same time. This family of algorithms solves optimal control problems including a nonlinear cost function and nonlinear system dynamics in an iterative way. Given an initial stabilizing control policy, a local linear approximation of the system dynamics and a linearquadratic approximation of the cost function is computed. Then, an incremental improvement to the control law is designed and the process repeated until convergence is reached at a local minimum of the cost function. A major reason for the popularity of DDPstyle methods is that their computational complexity is often linear in the number of time steps, .
An important DDP variant is the iterative Linear Quadratic Regulator (iLQR) [3], which is also known as Sequential Linear Quadratic Optimal Control (SLQ) [4]. While DDP is an exactHessian method requiring secondorder derivatives of the system dynamics (note the exception in [5]), iLQR requires only firstorder derivatives of the dynamics thanks to a GaussNewton Hessian approximation. This renders it an attractive choice for a large range of practical control applications, for example nonlinear model predictive control on robotic systems [6, 7, 8]. iLQR has been applied for trajectory optimization of complex underactuated robots [9]
, and has been combined with Machine Learning
[10].To date, one of the main shortcomings of the algorithm concerns the treatment of constraints. So far, iLQR can handle bounds on the control input, see [11], with slight modifications. In [12], it is shown for the discretetime case that statecontrol equality constraints can be incorporated elegantly through elimination of corresponding Lagrange Multipliers from the firstorder necessary conditions of optimality. This approach maintains linear time complexity and results in a feedbacklaw respecting the stateinput constraints. In the same work, an approach to treat pure stateconstraints is presented, which however requires the solution of a largescale linear system of equations, rendering the overall algorithm and resulting in a feedback law which is not stateconstraint compliant. The continuoustime counterpart for stateinput equality constrained iLQR is given in [13]. Additionally, purestate equality and inequality constraints are incorporated as soft constraints through a penalty method. While the penalty method adds low computational overhead in each iLQR iteration, an increased number of overall iterations may be required due to the gradual ‘tightening’ of the penalty coefficient. Furthermore, the resulting feedback gains are not compliant with the state constraints. Prior work in constrained DDP [14, 15] includes arbitrary nonlinear equality and inequality constraints, but requires solving Quadratic Programs for every timestep and every iteration in order to ensure strict feasibility during the forward integration.
In previously presented iLQR variants, the treatment of pure state constraints as ‘hard constraints’ remained a bottleneck, both because of computational complexity and feedback. This paper contributes an additional perspective on equalityconstrained iLQR and introduces a new approach in which the constraints are incorporated using a projection.
IB Contribution
This paper presents a state and stateinput hardconstrained iLQR variant. The algorithm is based on a projection of the control input onto the nullspace of the linearized constraints. While this ensures the system dynamics to evolve on a subspace tangent to the manifold of nonlinear constraints, the projection leads to a singular control weighting matrix in the linearquadratic optimal control problem. By applying an existing result from the theory of singular optimal control, we derive a fully constraintcompliant feedforwardfeedback control update rule, for which we can solve efficiently with Riccatistyle difference equations with linear time complexity. Two simulation examples are given to validate and illustrate the performance of the approach.
IC Outline
This paper is structured as follows. In Section II, the optimal control problem and its linearquadratic approximation are introduced. In Section III, we establish a way of addressing state and stateinput constraints by means of a projection and derive the associated projected optimal control problem. We show a solution approach, derive the corresponding Riccatilike equations and present connections to existing work. Section IV showcases two equalityconstrained simulation examples from a robotics context. A discussion and outlook conclude the paper in Section V.
Ii Problem Statement
Consider the following discretetime, finitehorizon, constrained nonlinear optimal control problem
(1)  
subject to  
(2)  
(3)  
(4)  
(5) 
with statevector
and control input vector . denotes the intermediate, nonnegative cost rate at timestep and the terminal, nonnegative cost at timestep . and are the nonlinear stateinput and pure state intermediate equality constraints, is the terminal pure state constraint, respectively.The locallyoptimal control law is constructed in an iterative fashion. In each iteration, starting at a constraintsatisfactory initial condition , we obtain nominal state and control input trajectories and through a forward integration of the system using the current policy. Note that, in the first iteration, a stable initial control policy is required. Next, we linearize the system dynamics and constraints and quadratically approximate the cost around the nominal trajectories. Denoting state and control deviations as and gives rise to the following constrained, linearquadratic optimal control problem
(6)  
subject to  
(7)  
(8)  
(9)  
(10) 
where , , , , and are the coefficients of the Taylor expansion of the cost function (1) around the nominal trajectories, Equation (7) is the linearized system dynamics and Equations (8) to (10) are the linearized constraints. We assume , and . The goal is to solve this constrained linearquadratic subproblem for a local control policy update and to update the overall control policy accordingly, .
Iii Projected Iterative Linear Quadratic Optimal Control
Iiia Projecting Constraints
Besides assuming that the problem is not overconstrained, we require Equations (8) to (10) to be welldefined in the following sense. First we assume, without loss of generality, that the stateinput constraint matrix has full row rank. If this condition is violated for some , the linearized stateinput constraint needs to be factorized into a pure state constraint which can be appended to Equation (9) and a remaining stateinput constraint with full rank. Second, we assume that all and are full row rank (otherwise, remove linearly dependent rows). Third, we assume that the relative degree of the constrained system is welldefined and . Consider the matrices , defined as follows
(11)  
(12) 
which quantify the propagation of a stateinput pair , through the linear system dynamics, into the linear state constraint equation at the next time step. The constrained system given by Equations (7) to (10) is said to have relative degree one at time index , if the matrices and are both full row rank.
In this work, we account for the linearized constraints in the optimal control problem (6)(10) as follows. By means of a projection, we limit the set of admissible control inputs at time index to such controls which

lead to a satisfaction of the linearized stateinput constraints immediately and

lead to satisfaction of the linearized pure state constraints at the next upcoming timestep.
Hence, we rewrite the pure state constraint in terms of a ‘previewed’ stateinput constraint
(13) 
and at time index , the control input is to be selected such that it complies with Equation (13), while also minimizing the cost function (6). To achieve both, we propose the following structure for the control update
(14) 
with being a projection matrix onto the nullspace of (for readability, we use the short notation in the following). While can be freely selected in order to control the system in the nullspace of the linearized constraints, we dedicate to controlling the system onto this nullspace in the first place. By inserting Equation (14) into (13) and forming an unweighted MoorePenrose pseudoinverse, we derive the optimal choice for
(15) 
Note that it decomposes into a feedforward term , which can be interpreted as the minimum corrective action to eliminate constraint violations and , and a feedback matrix which governs a possibly deviating state back onto the constraint nullspace.
In the following, we write generally as and the complete control increment correspondingly as
(16) 
In order to compute the pseudoinverse within and we perform a matrix factorization of
in terms of an SVD or a QR decomposition at every time step
. Note that as a sideproduct of such a decomposition, we can obtain a set of basis vectors for the nullspace of , required to compute the projection matrix .IiiB Reformulation as Singular Optimal Control Problem
Rewriting the linearized system dynamics (7) using the restructured input (16) leads to a new, affine system equation
(17) 
Similarly, it allows us to reformulate the original linearquadratic optimal control problem (6)(10) into a projected optimal control problem with respect to
(18)  
s.t.  (19) 
where we define the projected weighting matrices for the intermediate cost as
(20)  
(21)  
(22)  
(23)  
(24)  
(25) 
Note that the terminal cost remains unchanged. The intermediate cost remains quadratic in and . remains positive semidefinite. However, the new input cost weighting matrix may become nonnegative definite (singular) for any originally positivedefinite matrix due to the multiplication with the constraint nullspace projector . Hence, in order to compute the optimal control update which minizes the costtogo for time , we need to solve a singular optimal control problem.
IiiC Computing the Optimal Control Increment
Assume a quadratic value function of the form
(26) 
with weighting matrices , and . From a value function perspective, the optimal control update can be derived by minimizing for a given . To the best of our knowledge, classical iLQR approaches have so far only treated the case of a strictly positivedefinite Hessian approximation, equivalent to , which does not directly transfer to the optimal control problem (18)(19). In this work, however, we exploit the fact that for any discretetime linearquadratic optimal control problem, despite its possible singularity, the associated Riccati Difference Equation is welldefined, as long as a certain set of conditions is met [16]. For the projected optimal control problem (18)(19), these conditions can be interpreted as
(27)  
(28) 
For a detailed discussion and derivation of such conditions, the interested reader is referred to the original work by Clements and Anderson in [16]. Since we assume that , it is straightforward to show that Equations (27) and (28) are satisfied for any symmetric, positive semidefinite weighting matrix . Hence, the singular optimal control problem (18)(19) can be addressed using the wellestablished quadratic valuefunction approach, where the Hessian’s inverse can be replaced by its MoorePenrose pseudoinverse.
It is simple to show that, if the value function (26) is quadratic in for time index , it remains quadratic during backpropagation in time, given the affine system dynamics in Equation (19) and the linear quadratic cost in Equation (18). According to Bellman’s Principle of Optimality, the optimal control input at the preceding time index arises from
Inserting the value function (26) together with the affine system dynamics (19) and minimizing the overall expression w.r.t. leads to an optimal control update of the form
(29) 
where we have defined the abbreviations
Note that denotes the unweighted MoorePenrose pseudoinverse of the possibly singular Hessian approximation . By plugging the optimal control update (29) into the above expression for we can confirm the quadratic nature of the optimal value function in .
After equating coefficients with a quadratic value function Ansatz of form (26) for , we define and and obtain the following, recursive Riccatilike difference equations for , and
(30)  
(31)  
(32) 
which are valid for all . For the final timestep we obtain the terminal conditions , and .
IiiD Interpretation and Connection to Existing Results
Examining Equation (33), it becomes obvious that the control update consists of a feedforward and a feedbackterm, which are both composed of one part stemming from the computation of and one part from the projected iLQR iteration performed in subspaces tangent to the constraint manifold. At this point, several connections to existing results in the field can be made:

Considering the overall feedback matrix , there is an obvious connection to the work by Hemami et al. in [17], in which it is shown that the set of all constraint compliant feedback matrices for a pair of linear system matrices is an affine subset of all admissible, linear statevariable feedback gains. In our case, this affine subset is offset from the origin by and spanned by the basis of the constraint nullspace, as indicated by .

Special case 1: when only pure stateconstraints of form are considered for a steady setpoint, our iLQRiteration simplifies to the stateconstrained LQR formulation and corresponding Riccatilike equations as presented by Ko et al. in [18].

Special case 2: when only stateinput constraints are considered, our iLQRiteration simplifies to the result for stateinput constraints given by Sideris et al. in [12].

The presented approach can be interpreted as a variant of hierarchical DDP [19], with a first task forming the constraints and a second task encoded in the cost.
IiiE Main Iteration
Algorithm 1 summarizes the main iLQR iteration. The final control update after each Riccati backward pass is obtained through a line search where we select an such that the control policy
minimizes a merit function defined as
(34) 
The merit function consists of the nonprojected cost function, the real positive tuning parameter and the norm of the equality constraint violation evaluated for every . We consider the algorithm converged when both the relative merit function change and the constraint’s integrated square error (ISE) are below some userdefined threshold.
Iv Simulation Examples
Iva Discretizing Continuoustime Rigid Body Dynamics
While the presented constrained iLQR algorithm is formulated in discretetime, most of the robotics modelling tools available today generate continuoustime ordinary differential equations. In this work, we autogenerate the dynamics and kinematics ODEs using the robotics code generator ‘RobCoGen’
[20], and obtain the continuoustime dynamics derivatives through algorithmic differentiation using our opensource framework [21, 22].For time discretization, we assume that the control input is interpolated as zeroorder hold. Consider the example of a pure state constraint on position level where the continuoustime rigid body dynamics of a robot model equation reads as
, with generalized coordinates . In the theoretic case of using a simple explicit Euler integrator both for the forward rollout of the system dynamics and for the discretization of the system matrices and , at least two timesteps would be required until the control input torque became ‘visible’ to the position constraint, in terms of a change of . Correspondingly, the system’s relative degree would be at least two and the projected iLQR algorithm presented in this paper would not be admissible.Therefore, for the following rigidbody system examples, the forward integration is performed using higherorder RungeKutta integrators. We obtain the system matrices , through integrating a sensitivity ODE, which proved to be sufficient for rendering the systems considered in this paper relative degree 1 in various different simulation settings. All simulations were done in C++.
IvB Multicopter with Body Position Constraints
In a first example, we apply the algorithm to a multicopter trajectory optimization problem, where the system is modelled similar to [6]. In order to illustrate the algorithm’s capability to deal with complex nonlinear equality constraints, we show a positioning task including position constraints on the multicopter Center of Mass (CoM). With the CoM position being expressed in coordinates, we consider the constraint . The multicopter needs to reach a desired position as closely as possible while respecting the constraint landscape. For a maneuver with time horizon sec, and time discretization ms, the optimization converges in 7 iterations with a total constraint ISE , and CPU time 0.14 sec (Intel Core i74850HQ CPU, 2.30 GHz). A plot of the trajectory is given in Fig. 1.
IvC 6 DoF Robot Arm with EndEffector Position Constraint
In this example, we showcase a trajectory optimization example for a fixedbase 6 DoF robot arm subject to an endeffector position constraint. The employed arm model is inspired by [23]. Starting at an initial position ‘A’, the task objective encoded in the cost function is to rotate the first joint about 90, while the endeffector path is constrained to lie on a sinusoid curve starting at the initial endeffector pose. The initial guess is a simple inversedynamics feedforward for the initial pose, combined with an infinitehorizon LQR feedback controller. Fig. 2 shows snapshots of the obtained trajectory. Fig. 3 shows the average CPUtime for the same problem using different sampling times and indicates linear timecomplexity .
V Discussion and Outlook
This paper introduces a state and stateinput constrained, projectionbased iLQR variant, along with promising simulation results. Essentially, the presented approach is a SingleShooting GaussNewton type method. As such, it is crucial that a stable, initial control policy is provided. Note that this initial control policy is not required to lead to a constraint satisfactory trajectory (in fact, during the course of the simulation experiments, we observed that a wide range of different initial controllers and motions would lead to closely similar final solutions within a similar number of iterations). However, in order to achieve a sufficiently small overall constraint violation, the initial state is strictly required to be constraint compliant. Otherwise, it requires at least one time step with nonzero constraint violation to pass before the system can be steered onto the constraint manifold.
A unique feature of the presented algorithm is that it optimizes a feedforward trajectory, while also providing a control feedback that is guaranteed to be stateconstraint compliant. While we have already used the stateinput constrained feedback gains in hardware experiments in [7] as a special case of this work, a verification of the purelystate constrained gains for practical robotics tasks is subject to ongoing work.
In terms of computational complexity, the algorithm is in the number of timesteps . In comparison to traditional unconstrained iLQR, the only computationally expensive operation added at each time step is the factorization of a stacked matrix reflecting state and stateinput constraints. It shall be noted that this process bears potential for computational parallelization, as the factorizations can be performed independently at every time step.
The method is free of tuning parameters, except for the cost function weights, which is the same for unconstrained iLQR, and for the merit function weight . In practice, the algorithm converges to the same solution for a broad range of choices for , in all shown simulation examples convergence was achieved with low tuning effort starting from rough guesses for the parameter.
A current limitation is certainly the restriction to systems with relative degree equal to 1. Essentially, the relative degree expresses after how many time steps of system evolution an initially stateinput constraint compliant control at time index becomes fully visible to the purestate constraint. Note that an extension to relative degrees higher than one is possible, but does not easily generalize as it requires a casespecific examination of the linear independence constraint qualification of the constrained optimal control problem. In certain cases, however, it may be possible that the constraints cannot be represented with relative degree one using the simple integration and discretization scheme from Section IVA, for example when complex actuator dynamics are included in the model. An extension of this work to cases with more complex constraints and dynamics is subject to future work. While initial simulation results look promising, another natural extension of this work is a rigorous comparison in terms of performance, convergence, robustness and computational complexity to reference implementations of established constrained optimization methods.
Acknowledgements
The authors would like to thank Moritz Diehl, Farbod Farshidian and Michael Neunert for fruitful discussions. This research was supported by the Swiss National Science Foundation through the NCCR Digital Fabrication and a Professorship Award to Jonas Buchli.
References
 [1] D. Mayne, “A secondorder gradient method for determining optimal trajectories of nonlinear discretetime systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
 [2] M. Diehl, H. G. Bock, H. Diedam, and P.B. Wieber, “Fast direct multiple shooting algorithms for optimal robot control,” in Fast motions in biomechanics and robotics, pp. 65–93, Springer, 2006.
 [3] E. Todorov and W. Li, “A generalized iterative LQG method for locallyoptimal feedback control of constrained nonlinear stochastic systems,” in American Control Conference, 2005. Proceedings of the 2005, pp. 300–306, IEEE, 2005.
 [4] A. Sideris and J. E. Bobrow, “An efficient sequential linear quadratic algorithm for solving nonlinear optimal control problems,” Transactions on Automatic Control, vol. 50, no. 12, pp. 2043–2047, 2005.
 [5] Z. Manchester and S. Kuindersma, “Derivativefree trajectory optimization with unscented dynamic programming,” in IEEE 55th Conference on Decision and Control (CDC), pp. 3642–3647, dec 2016.
 [6] M. Neunert, C. de Crousaz, F. Furrer, M. Kamel, F. Farshidian, R. Siegwart, and J. Buchli, “Fast nonlinear model predictive control for unified trajectory optimization and tracking,” in IEEE International Conference on Robotics and Automation (ICRA), 2016.
 [7] M. Giftthaler, F. Farshidian, T. Sandy, L. Stadelmann, and J. Buchli, “Efficient Kinematic Planning for Mobile Manipulators with Nonholonomic Constraints Using Optimal Control,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 3411–3417, May 2017.
 [8] F. Farshidian, E. Jelavić, A. Satapathy, M. Giftthaler, and J. Buchli, “Realtime motion planning of legged robots: A model predictive control approach,” in 2017 IEEERAS 17th International Conference on Humanoid Robots (Humanoids), Nov 2017.
 [9] M. Neunert, F. Farshidian, A. W. Winkler, and J. Buchli, “Trajectory optimization through contacts and automatic gait discovery for quadrupeds,” IEEE Robotics and Automation Letters (RAL), 2017.
 [10] D. Mitrovic, S. Klanke, and S. Vijayakumar, “Adaptive optimal feedback control with learned internal dynamics models,” in From Motor Learning to Interaction Learning in Robots, pp. 65–84, Springer, 2010.
 [11] Y. Tassa, N. Mansard, and E. Todorov, “Controllimited Differential Dynamic Programming,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 1168–1175, IEEE, 2014.
 [12] A. Sideris and L. A. Rodriguez, “A Riccati approach for constrained linear quadratic optimal control,” International Journal of Control, vol. 84, no. 2, pp. 370–380, 2011.
 [13] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli, “An efficient optimal planning and control framework for quadrupedal locomotion,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
 [14] T. C. Lin and J. S. Arora, “Differential dynamic programming technique for constrained optimal control,” Computational Mechanics, vol. 9, pp. 27–40, Jan 1991.
 [15] Z. Xie, C. K. Liu, and K. Hauser, “Differential Dynamic Programming with Nonlinear Constraints,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
 [16] D. J. Clements and B. D. Anderson, “Singular Optimal Control: The LinearQuadratic Problem,” in Lecture Notes in Control and Information Science, vol. 5, SpringerVerlag, Berlin, Germany, 1978.
 [17] H. Hemami and B. Wyman, “Modeling and control of constrained dynamic systems with application to biped locomotion in the frontal plane,” IEEE Transactions on Automatic Control, vol. 24, no. 4, pp. 526–535, 1979.
 [18] S. Ko and R. R. Bitmead, “Optimal control of linear systems with state equality constraints,” IFAC Proceedings Volumes, vol. 38, no. 1, pp. 407–412, 2005.
 [19] F. Romano, A. Del Prete, N. Mansard, and F. Nori, “Prioritized optimal control: A hierarchical differential dynamic programming approach,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3590–3595, May 2015.
 [20] M. Frigerio, J. Buchli, and D. Caldwell, “Code generation of algebraic quantities for robot controllers,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, Oct 2012.
 [21] M. Giftthaler, M. Neunert, M. Stäuble, and J. Buchli, “The Control Toolbox  An OpenSource C++ Library for Robotics, Optimal and Model Predictive Control.” https://adrlab.bitbucket.io/ct, 2018. arXiv:1801.04290 [cs.RO].

[22]
M. Neunert, M. Giftthaler, M. Frigerio, C. Semini, and J. Buchli, “Fast derivatives of rigid body dynamics for control, optimization and estimation,” in
2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), San Francisco, pp. 91–97, Dec 2016.  [23] B. U. Rehman, Design and Control of a Compact Hydraulic Manipulator for Quadruped Robots. PhD thesis, Istituto Italiano di Tecnologia (IIT) and University of Genova, 2016.
Comments
There are no comments yet.