I Introduction
Ia Literature Review
Many tasks in agriculture, construction, defense, and disaster response require mobile robots to traverse over irregular terrains and move through narrow passages. The mobility afforded by legged robots makes them exceptionally suitable for these scenarios. Practical challenges to unlock their mobility include the highly nonlinear and hybrid nature of the multicontact dynamics, onthefly generation of motion plans, and management of various constraints.
Despite of these difficulties, many successful algorithms have been developed and tested in simulation and on hardware [1, 2, 3, 4, 5, 6, 7]. Many conventional approaches optimize the Center of Mass (CoM) trajectory and foothold locations using a reducedorder model and adopt QPbased operational space control (OSC) laws [4, 3, 5] to select joint torques that track the planned trajectories. Widely used reducedorder models include the Linear Inverted Pendulum (LIP) [2] and SpringLoaded Inverted Pendulum (SLIP) [4, 3]
for determining foothold locations, with the ZeroMoment Point (ZMP) criterion used to enforce admissible CoM trajectories
[1, 2, 3]. Centroidal dynamics models have also been used that consider the linear and angular momentum of the system as a whole [8, 6, 7]. Overall, these approaches have the advantage of fast computation, but the complexity of the resulting motions is limited. For example, motions such as standing up from the ground cannot be generated with a LIP model since it neglects all kinematics constraints and assumes constant height and zero angular momentum.Compared to this conventional approach, wholebody motion planning methods can generate more complex behaviours. Whereas QPbased OSC only considers the instantaneous effects of joint torques, wholebody motion planning strategies aim to find a sequence of torque commands by solving a finitehorizon trajectory optimization (TO) problem. Despite the appeal of this approach, the curse of dimensionality caused by the highdimensional state space of legged robots has prevented it from being popular. Recent results (e.g.,
[9]) using Differential Dynamic Programming (DDP) [10] have shown great promise for online use. cite_minmaxSince then, many DDP advances have been proposed, demonstrating robustness for rejecting disturbances [11] and realtime performance for wholebody motion planning [12, 13, 14, 15].DDP_complexityUnlike conventional direct methods, which optimize over all decision variables together, DDP adopts a divide and conquer strategy by successively solving much smaller optimization problems [10]. This feature makes DDP exceptionally suitable for problems with long time horizons because the computational effort scales linearly with time as opposed to quadratic or cubic growth with many nonlinear programming (NLP) approaches to TO. DDP_for_MPCSince DDP is a shooting method, the algorithm can also be terminated at any time while still giving a physically valid trajectory. These features and the successes of [13, 14, 15] together suggest the promise of DDP for online MPC over other direct methods.
Despite of these benefits and promise, there are some difficulties for DDP to be used in legged locomotion planning, such as dealing with the impact discontinuity and management of various constraints. The first difficulty is addressed in [9] by approximating the impact discontinuity with a smooth transition, and in [13] by ignoring the impact in DDP but compensating for this simplification with a feedback controller. These approaches either present a robustness issue or do not have experimental evidence. cite_contrl_limitedOther previous work has contributed to attacking the second difficulty by leveraging constrainthandling techniques from NLP. Box constraints on control are handled by solving QPs with a Projected Newton algorithm in [16]. A penalty method is used in [13] to satisfy state constraints. This method, however, has a numerical illconditioning problem that results when penalty coefficients are large. Augmented Lagrangian (AL) methods (e.g., [17]) resolve this issue by adding a linear multiplier term. Lantoine et al. [18] proposed a DDP algorithm that handles the terminal state constraints using AL, motivating their use to address the statebased switching for hybrid systems in this work.
IB Contribution
In this paper, we propose a Hybrid Systems DDP (HSDDP) approach that extends the applicability of DDP to hybrid systems. In particular, HSDDP includes three algorithmic advances: an impactaware DDP step that addresses impact discontinuities, an AL method for switching constraints, and a switching time optimization (STO) strategy. Further, in order to deal with the inequality constraints in legged locomotion, a relaxed barrier (ReB) method is adopted and is integrated within HSDDP. The developed algorithms are benchmarked in simulation on Mini Cheetah bounding as shown in Fig. 1. The developed algorithms are extendable to general gaits such as trotting and galloping etc., and to other platforms such as bipeds and manipulators.
The structure of this paper is as follows. The DDP background and the hybrid dynamics formulation are introduced in Sections II and III. Section IV discusses the four contributions of this paper, i.e., Impact Aware DDP, AL for statebased switching, a ReB method for DDP, and a DDPbased STO algorithm. Section V analyzes the performance of the proposed algorithm in terms of constraint handling and efficiency of the STO as applied to quadruped bounding. Section VI provides a closing discussion.
Ii Background: Differential Dynamic Programming
This section gives a brief introduction to DDP following [9]. Readers are referred to [10] for detailed derivation. The goal of DDP is to find an optimal control sequence that minimizes a cost function of the form
(1) 
where denotes the state trajectory, denotes the running cost, and denotes the terminal cost. The trajectory is subject to the discretized dynamics
(2) 
where and respectively denote the state and control variables. DDP recursively finds by repeatedly executing a forward sweep and a backward sweep. Given a nominal control sequence, the forward sweep computes a nominal trajectory and the associated dynamics derivatives. A backward sweep is then executed to generate a policy that is used to update the control sequence. As this process continues, the control sequence (locally) converges to . citeMoritz
Since DDP optimizes only over the control sequence, it can be classified as a direct shooting method. Interested readers may refer to
[19] for a discussion of tradeoffs with other direct methods.Denote the value function (optimal costtogo) at time step . Using Bellman’s principle of optimality, is given recursively backward in time:
(3) 
Attempts to solve (3) directly are difficult since analytical expression of is rarely possible due to nonlinearity of . To avoid this problem, DDP considers the variation of around a nominal statecontrol pair under the perturbation . The resulting variation is approximated to the second order as:
(4) 
where
(5a)  
(5b)  
(5c)  
(5d)  
(5e) 
in which the subscripts indicate the partial derivatives and the prime indicates the next time step. Note that , and
generally are tensors. The notation ‘
’ denotes matrixtensor multiplication. Omitting the third terms in the last three equations gives rise to the iLQR algorithm, which enables faster iterations but loses quadratic convergence properties. iLQR_gradientWe employ iLQR in this work and use the algorithm proposed in [20, 21] to efficiently compute .Optimizing over results in the optimal control increment around the nominal control and
(6) 
where is the step direction and is the feedback gain. Substituting (6) into the equation (4) results in update equations for the quadratic model of according to
(7a)  
(7b)  
(7c) 
where denotes the expected cost reduction.
The equations (5) and (7) are computed recursively starting at the final state, constituting the backward pass of DDP. The nominal control is then updated using the resulting control policy (6) as follows,
(8) 
where is a line search parameter, and respectively are the nominal and new statecontrol pair. linesearchA backtracking line search method is used to select [9] and a regularization strategy as in [9] is employed, ensuring decrease of the cost in each iteration. The forwardbackward process above is repeated until the algorithm converges or certain number of iterations is reached.
Iii Background: Dynamics Modeling
This section presents a hybrid system model for bounding quadrupeds. Fig. 2 shows one gait cycle of quadruped bounding with four continuous modes and a reset map between every two consecutive modes. Denote the mode sequence where represents the total number of modes. Then, denotes one gait cycle. The continuous dynamics in mode , denoted by , takes place on domain . The reset map takes place on the switching surface at the boundary of . Mathematical definitions of and are introduced later. Denote the generalized coordinates of the quadruped and
the state vector where
. The hybrid model is given as(9) 
where ‘’ and ’+’ indicate pre and posttransition states.
Denote the contact foot in mode and the other foot. During a flight mode, represents the foot scheduled to touch down at the end of flight. The sets and are defined for one gait cycle as in Fig. 2,
(10a)  
(10b)  
(10c) 
where is a function measuring the vertical distance of the corresponding foot to the ground, denotes the tangent bundle of the configuration space . Although the hybrid model (9) and (10) are presented for one gait cycle for simplicity, it can be extended to multiple gait cycles.
Iiia Continuous Dynamics
The continuous dynamics in (9) varies depending on which legs are in stance. However, these dynamics can be formulated with a unified structure as follows:
(11) 
where , , , and denote the inertia matrix, Coriolis force, gravity force, selection matrix, and actuation torque, respectively. and represent the contact Jacobian and contact force associated with the contact foot . The matrix on the left side of (11) is known as the KKT matrix, since the equation (11) can be obtained via KKT conditions [12]. When the robot is in flight, and are not meaningful anymore, and the KKT matrix degenerates to the inertia matrix . The statespace representation of (11) is obtained by premultiplying both sides of (11) by the inverse of the KKT matrix and separating the solution for .
IiiB Reset Maps
While the generalized coordinates remain unchanged across impact events, velocities change instantaneously at each touch down. The impact dynamics are modeled as
(12) 
where denotes the coefficient of restitution. Perfect inelastic collision with is assumed in this work, which means the foot velocity immediately becomes zero at touch down. The vector denotes the impulse acting on the contact foot that is scheduled to touch town at the end of flight. Note that there is no control present in the model (12) since the actuator cannot generate impulse. By separating out, the statespace representation of the reset map at impact is where
(13) 
Note that the transition from stance to flight is continuous, and, thus, when denotes a stance phase.
IiiC Time Switched Hybrid System
We associate the predetermined mode sequence with a switching time vector where represents the terminating time of the mode. Along any trajectory of the stateswitched hybrid system, (9) and (10) are equivalent to the timeswitched hybrid system:
(14) 
with the enforced switching constraint:
(15) 
In this work, this timeswitched reformulation is considered, where variables are optimized under switching constraints.
Iv Hybrid System Differential Dynamic Programming
This section discusses three algorithmic advances for HSDDP and presents the ReB method. An overview of HSDDP and ReB is shown in Fig. 3. The HSDDP takes a twolevel optimization strategy. In the bottom level, the switching times are fixed and the AL algorithm is executed. This algorithm continuously calls the impactaware DDP. Once DDP converges, the constraint violations are remeasured and added to the cost function, and another DDP call is executed. The AL algorithm terminates when all switching constraints are satisfied. The output from this loop is then utilized by the STO algorithm to update the switching times. This process repeats until the switching times are optimal. The ReB algorithm is executed whenever the AL algorithm is executed. The entire algorithm framework is presented to plan trajectories for quadruped bounding introduced in Section III.
Iva Wholebody Motion Planning Problem
To find an optimal trajectory, we formulate a TO problem
(16) 
where and respectively denote the continuoustime running cost and the terminal cost for the mode. In wholebody TO, minimization of (16) is subject to the fullorder dynamics (14) and other various constraints. A common way to solve (16) is to formulate a discretetime optimal control problem (OCP) with integration timestep as follows
(17a)  
subject to  (17b)  
(17c)  
(17d)  
(17e)  
(17f)  
(17g) 
where
(18) 
and approximates the integral of running cost over integration timestep , denotes the length of time horizon up to the mode. Equations (17c) and (17d) represent the dynamics and reset map constraints, respectively, IntegrationSchemewhere is obtained via forward integration of . This work uses a forward Euler method, but all algorithmic advances hold with other integration schemes. Equation (17e) represents the switching constraint, and inequalities (17f)(17g) represent the torque limit, nonnegative normal GRF, and friction cone constraint, respectively.
IvB ImpactAware Value Function Update in DDP
The impactaware DDP extends DDP to address the impact effect and does not consider constraints (17e)  (17g). While the impactaware DDP executes the same forward sweep as DDP, it modifies the update equations (7) for the quadratic value function model at switching surface. Suppose that , , and are known at , which can be computed from DDP. The dependency of all variables on is ignored here for simplicity. Since there is no control applied at , according to Bellman’s Principle of Optimality
(19) 
Since and can be computed from the forward sweep, the variation of (19) around and is considered, i.e.,
(20) 
where
(21) 
in which is the Jacobian of relative to . Approximating both sides of (20) to the second order, we obtain
(22a)  
(22b)  
(22c) 
The equations (22) establish the model update equations at the switching surface, which, together with (7), constitute the model update equations of for hybrid systems.
IvC Augmented Lagrangian for Switching Constraints
The impactaware DDP solves unconstrained optimization problems. Nevertheless, it can be combined with various constrainthandling techniques from NLP for constrained optimization. In this section, we are particularly interested in the switching equality constraint (17e). Penalty methods [13] to manage this constraint add a squared term of the constraint violation to the cost function. However, a numerical illconditioning issue could happen as the penalty increases. An AL method is employed in this work, which, in addition to the quadratic term, adds a linear Lagrange multiplier term to the cost function, avoiding the numerical illconditioning.
With the AL technique, the cost function now becomes
(23) 
where denotes the set of all touch down points, and denote the penalty and the Lagrange multipliers, respectively. The subscript ‘’ is the AL iteration. The AL begins with certain initial values for and , and solves the resulting TO using impactaware DDP. The parameters and are then updated and the new TO is resolved using the resulting optimal control as a warm start. The update equations are
(24) 
where is the penalty update parameter. This process is repeated until is within the threshold . To make a distinction, one execution of the forward sweep and backward sweep of DDP is called one DDP iteration. Pseudocode for the AL algorithm is shown in Algorithm 1.
IvD Relaxed Barrier Function for Inequality Constraints
We employ a relaxed barrier (ReB) method [22] to manage the inequality constraints in (17). Given any inequality constrained optimization problem below
(25)  
subject to 
ReB attacks (25) by successively solving the unconstrained optimization
(26) 
where is a weighting parameter and
(27) 
which is called a ReB function where is the relaxation parameter. smoothly extends the logarithmic barrier function over the entire real domain with a polynomial of order . In many cases, works well [22]. Consequently, when applied to a TO problem, the ReB method allows the objective function to be evaluated at infeasible trajectories, which cannot be done with a standard barrier method. Note that is updated toward zero in an outer loop. This drives the resulting trajectory towards feasibility.
With this technique, the inequality constraints (17f)  (17g) are turned into ReB functions and added to the objective function . Combing this technique with AL, the constrained TO (17) is converted into an unconstrained optimization problem which is solved using the impactaware DDP. The AL parameters , and the ReB parameter are updated in an outer loop as shown in Algorithm 1.
IvE Switching time optimization based on DDP
While Algorithm 1 finds the optimal control for the OCP (17) (equivalently (16)) for fixed , the switching time optimization (STO) algorithm developed in this section updates towards its optimal value under a fixed control policy (6). This approach is different from [23] where the control sequence is fixed without feedback from the state.
The STO algorithm reformulates the OCP (16) on fixed time intervals of length one, and augments the state vector with an extra state representing the time span of each mode. Denote the time state, the scaled system state, and the augmented state. Then, Algorithm 1 can be used to find , , , , and in the backward sweep. The most recent values of and are then used to update the switching time using Newton’s method.
We first discuss how the OCP (16) is formulated on fixed time intervals and then introduce how STO is derived under this new formulation. Let and . With the change of variable , timescaled dynamics are obtained as
(28) 
with the switching constraint
(29) 
The timing state has the initial condition . The cost function in OCP (16) now becomes
(30) 
We can now apply the Algorithm 1 to minimize (30) under the fixed initial condition . Once Algorithm 1 converges, it implies that (1) the control sequence and trajectory are (locally) optimal and (2) the quadratic model of is a valid approximation of . The gradient and Hessian are then obtained from the quadratic model. Since only affects the dynamics at the initial condition, is updated using
(31) 
where denotes the switching time line search parameter. timing_line_searchSimilar to DDP, we perform a backtracking line search to select and ensure cost reduction with (31).
Algorithms 1 and 2 are combined to solve the OCP (17) simultaneously for optimal and as shown in Fig. 3, constituting HSDDP. Note that the STO algorithm is executed after the AL algorithm converges, which implies that the feedforward term in equation (6) becomes zero, and, thus, the control policy is used in the line search for the timing variables and in the next forward sweep. The major difference between our method and the approach in [23] is the inclusion of this feedback term in the control law. The control policy used in this work allows (31) to make more aggressive updates, and consequently achieves faster convergence. The reason behind this is that any change in will create perturbations to the locally optimal trajectory. The effect of the change in on optimality is reduced by including the feedback term in control to account for perturbations. More details on this aspect are discussed in Sec. VD.
V Results: Bounding with A 2D Quadruped
Va Model and Simulation
The developed HSDDP is tested on a 2D model of simulated MIT Mini Cheetah [24] as in Fig. 2. We consider two trajectory optimization tasks. The first task fixes the switching times and applies Algorithm 1 on quadruped bounding for five gait cycles. We compare the results with those when AL + ReB is disabled and demonstrate satisfaction of constraints (17e)  (17g) within four AL iterations. The second task applies the HSDDP to quadruped bounding for one gait cycle and demonstrates the efficiency of the STO.
VB Fivegait Bounding with AL and ReB
In this task, Algorithm 1 is applied to 2D quadruped bounding for five gait cycles. The robot starts in the backstance mode and is desired to run at an average forward speed 1.0
. A constant reference configuration is assigned to each mode, which mimics the robot’s posture at the end of the mode and is selected heuristically. All desired joint velocities and vertical velocity are set to zero.
Quadratic running cost and terminal cost are used in (16),
(32)  
(33) 
where and are weighting matrices for state deviation and energy consumption in running cost, respectively, and is the weighting matrix for the terminal cost (of the mode). In this simulation, we have zero penalty on forward position, and relatively larger penalty on forward speed, body height and joint velocities than the other states. The integration timestep s is used, and the switching times are selected such that the flight mode (and the frontstance mode) runs for 72 and the back stance mode runs for 80 . The initial guess for Algorithm 1 is given by a heuristic controller, which implements the PD control in flight mode such that a predefined joint configuration is maintained. In stance mode, the heuristic controller constructs stance leg forces following a SLIP model and converts the Ground Reaction Force (GRF) thus obtained to joint torques. The AL and ReB parameters are initialized as , , , and , and the convergence threshold is .
VC AL and ReB Simulation Results
When AL is active and ReB is disabled, it takes three AL iterations for the constraint violation to decrease within . The convergence of the total cost (excluding the penalty term and the Lagrangian term) and switching constraint violation is shown in Fig 4. The blue square markers and the red circle markers indicate the beginning of the corresponding AL iteration. Fig. 4 demonstrates that at least one of the total cost and the constraint violation is reduced at every DDP iteration. Further, the algorithm spends more efforts in minimizing the total cost at the beginning and switches to the constraint violation after the total cost is converged.
Fig. 5 compares the bounding gaits that are generated by three methods: (1) A heuristic controller that is used to warm start the optimization, (2) DDP (with impactaware value function update) that ignores switching constraints, and (3) AL that enforces switching constraints. It demonstrates that the developed AL algorithm achieves the best performance. Though the motion generated by DDP is more smooth and realistic compared to the heuristic controller, the robot still violates the switching constraints, and the error accumulates over time. This is because the first two methods do not enforce switching constraints, and, thus, the robot could not correctly recognize the ground but still reset the dynamics.
Fig. 6 depicts the normal and tangent GRF for the front leg (top figure), and the torques for every joint (bottom figure) when the ReB is activated. The algorithm terminates at four AL iterations. It shows that the normal GRF is always nonnegative and that the friction and joint torques are confined within their boundaries, demonstrating the effectiveness of ReB method. Similar results are observed for the back leg.
VD OneGait Bounding with Time Optimized
In this task, HSDDP is applied to the generation of one bounding gait for the Mini Cheetah. Different from the previous task, where only the control is optimized, switching times are also optimized in this task. Only one gait cycle is studied here in the sense that, in many situations, the switching times found for one gait cycle can be extended to the succeeding gait cycles. The cost function, the initial control sequence, the initial switching times, the AL parameters, and the terminating conditions all remain the same in this task as in the previous one.
VE HSDDP Simulation Results
The optimal switching times obtained via the STO algorithm in HSDDP are shown in Fig. 7. The algorithm reduces the time of the first flight mode and the frontstance mode. Fig. 7 also compares the switching times obtained via the STO algorithm with the algorithm proposed in [23] where the feedback control is not used. Both algorithms are terminated at the iteration. With HSDDP, the overall time spent on the entire motion is 0.2335 s, a reduction of the initial overall time, whereas only a reduction is observed with the algorithm in [23], showing that the HSDDP is more efficient in the sense of taking larger steps.
Fig. 8 explains why the twolevel optimization strategy is adopted in HSDDP. With the scaled optimization structure (28), (29), and (30), it is reasonable to update the control using (6) and the switching times using (31) simultaneously since the gradient and Hessian information are all available in the backward sweep of DDP. If the actual cost reduction is less than zero and is close to the predicted cost reduction, then the quadratic model of value function is considered valid. The quadratic approximation, however, is more sensitive to the switching time line search parameter than the control line search parameter as shown in Fig. 8. This figure indicates that has to be as small as if (6) and (31) are executed simultaneously, at the price of much less cost reduction per iteration, and, thus, decreasing the convergence rate. Although this behavior is not observed for all iterations, it can significantly slow down the optimization if a small step size is continuously used for multiple iterations.
Vi Conclusions and Future Works
The proposed HSDDP framework combines three algorithmic advances to DDP for legged locomotion. It addresses the discontinuity at impacts by incorporating an impactaware value function update in the backward sweep. By combing AL and DDP, HSDDP reduces either the total cost or the constraint violation in every iteration, enforcing the switching constraint as the algorithm proceeds. Further, with the developed STO algorithm, HSDDP can efficiently find the optimal switching times alongside the optimal control. A ReB method is combined with HSDDP to manage the inequality constraints. The fivegaitcycle bounding example shows the promise of HSDDP in rapidly satisfying the switching constraint in just a few iterations, and demonstrates the effectiveness of ReB for enforcing inequality constraints. The onegaitcycle bounding example compares the developed STO algorithm to the previous solutions, demonstrating that our method is more efficient due to the inclusion of the feedback control in the forward sweep.
DiscussionIntegrationThough forward Euler integration is used in this work for dynamics simulation, the developed HSDDP is independent of the integration scheme. Implicit or higherorder methods can be used if the computation time is not the primary concern. The current implementation of HSDDP is MATLAB based, and so future work will benchmark its computational performance with C++ and realize the developed algorithm in experiments for realtime control with the Mini Cheetah. We are also interested in comparing ReB and AL in terms of their abilities for inequality constraint management.
References
 [1] C. D. Bellicoso, F. Jenelten, C. Gehring, and M. Hutter, “Dynamic locomotion through online nonlinear motion optimization for quadrupedal robots,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2261–2268, 2018.
 [2] P.B. Wieber, “Trajectory free linear model predictive control for stable walking in the presence of strong perturbations,” in IEEERAS Int. Conf. on Humanoid Robots, 2006, pp. 137–142.
 [3] T. Apgar, P. Clary, K. Green, A. Fern, and J. W. Hurst, “Fast online trajectory optimization for the bipedal robot Cassie.” in Robotics: Science and Systems, 2018.
 [4] P. M. Wensing and D. E. Orin, “Highspeed humanoid running through control with a 3DSLIP model,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2013, pp. 5134–5140.
 [5] M. Hutter, H. Sommer, C. Gehring, M. Hoepflinger, M. Bloesch, and R. Siegwart, “Quadrupedal locomotion using hierarchical operational space control,” The International Journal of Robotics Research, vol. 33, no. 8, pp. 1047–1062, 2014.
 [6] J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the MIT Cheetah 3 through convex modelpredictive control,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2018, pp. 1–9.

[7]
S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimizationbased locomotion planning, estimation, and control design for the atlas humanoid robot,”
Auton. robots, vol. 40, no. 3, pp. 429–455, 2016.  [8] D. E. Orin, A. Goswami, and S.H. Lee, “Centroidal dynamics of a humanoid robot,” Auton. robots, vol. 35, no. 23, pp. 161–176, 2013.
 [9] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2012, pp. 4906–4913.
 [10] D. Mayne, “A secondorder gradient method for determining optimal trajectories of nonlinear discretetime systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
 [11] J. Morimoto, G. Zeglin, and C. G. Atkeson, “Minimax differential dynamic programming: Application to a biped walking robot,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, vol. 2, 2003, pp. 1927–1932.
 [12] R. Budhiraja, J. Carpentier, C. Mastalli, and N. Mansard, “Differential dynamic programming for multiphase rigid contact dynamics,” in IEEERAS Int. Conf. on Humanoid Robots, 2018, pp. 1–9.
 [13] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli, “An efficient optimal planning and control framework for quadrupedal locomotion,” in IEEE Int. Conf. on Robotics and Automation, 2017, pp. 93–100.
 [14] J. Koenemann, A. Del Prete, Y. Tassa, E. Todorov, O. Stasse, M. Bennewitz, and N. Mansard, “Wholebody modelpredictive control applied to the HRP2 humanoid,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2015, pp. 3346–3351.
 [15] M. Neunert, M. Stäuble, M. Giftthaler, C. D. Bellicoso, J. Carius, C. Gehring, M. Hutter, and J. Buchli, “Wholebody nonlinear model predictive control through contacts for quadrupeds,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1458–1465, 2018.
 [16] Y. Tassa, N. Mansard, and E. Todorov, “Controllimited differential dynamic programming,” in Int. Conf. on Robotics and Automation, 2014, pp. 1168–1175.
 [17] T. A. Howell, B. E. Jackson, and Z. Manchester, “ALTRO: a fast solver for constrained trajectory optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2019, pp. 7674–7679.
 [18] G. Lantoine and R. P. Russell, “A hybrid differential dynamic programming algorithm for constrained optimal control problems. part 1: theory,” Journal of Optimization Theory and Applications, vol. 154, no. 2, pp. 382–417, 2012.
 [19] M. Diehl, H. G. Bock, H. Diedam, and P.B. Wieber, “Fast direct multiple shooting algorithms for optimal robot control,” in Fast motions in biomechanics and robotics. Springer, 2006, pp. 65–93.
 [20] A. Jain and G. Rodriguez, “Linearization of manipulator dynamics using spatial operators,” IEEE transactions on systems, man, and cybernetics, vol. 23, no. 1, pp. 239–248, 1993.
 [21] J. Carpentier and N. Mansard, “Analytical derivatives of rigid body dynamics algorithms,” in Robotics: Science and Systems, 2018.
 [22] J. Hauser and A. Saccon, “A barrier function method for the optimization of trajectory functionals with constraints,” in IEEE Conf. on Decision and Control, 2006, pp. 864–869.
 [23] X. Xu and P. J. Antsaklis, “A dynamic programming approach for optimal control of switched systems,” in IEEE Conf. on Decision and Control, vol. 2, 2000, pp. 1822–1827.
 [24] B. Katz, J. Di Carlo, and S. Kim, “Mini Cheetah: A platform for pushing the limits of dynamic quadruped control,” in Int. Conf. on Robotics and Automation, 2019, pp. 6295–6301.