Hybrid Systems Differential Dynamic Programming for Whole-Body Motion Planning of Legged Robots

06/15/2020 ∙ by He Li, et al. ∙ University of Notre Dame 0

This paper presents a Differential Dynamic Programming (DDP) framework for trajectory optimization (TO) of hybrid systems with state-based switching. The proposed Hybrid-Systems DDP (HS-DDP) approach is considered for application to whole-body motion planning with legged robots. Specifically, HS-DDP incorporates three algorithmic advances: an impact-aware DDP step addressing the impact event in legged locomotion, an Augmented Lagrangian (AL) method dealing with the switching constraint, and a Switching Time Optimization (STO) algorithm that optimizes switching times by leveraging the structure of DDP. Further, a Relaxed Barrier (ReB) method is used to manage inequality constraints and is integrated into HS-DDP for locomotion planning. The performance of the developed algorithms is benchmarked on a simulation model of the MIT Mini Cheetah executing a bounding gait. We demonstrate the effectiveness of AL and ReB for switching constraints, friction constraints, and torque limits. By comparing to previous solutions, we show that the STO algorithm achieves 2.3 times more reduction of total switching times, demonstrating the efficiency of our method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Literature Review

Many tasks in agriculture, construction, defense, and disaster response require mobile robots to traverse over irregular terrains and move through narrow passages. The mobility afforded by legged robots makes them exceptionally suitable for these scenarios. Practical challenges to unlock their mobility include the highly nonlinear and hybrid nature of the multi-contact dynamics, on-the-fly generation of motion plans, and management of various constraints.

Despite of these difficulties, many successful algorithms have been developed and tested in simulation and on hardware [1, 2, 3, 4, 5, 6, 7]. Many conventional approaches optimize the Center of Mass (CoM) trajectory and foothold locations using a reduced-order model and adopt QP-based operational space control (OSC) laws [4, 3, 5] to select joint torques that track the planned trajectories. Widely used reduced-order models include the Linear Inverted Pendulum (LIP) [2] and Spring-Loaded Inverted Pendulum (SLIP) [4, 3]

for determining foothold locations, with the Zero-Moment Point (ZMP) criterion used to enforce admissible CoM trajectories

[1, 2, 3]. Centroidal dynamics models have also been used that consider the linear and angular momentum of the system as a whole [8, 6, 7]. Overall, these approaches have the advantage of fast computation, but the complexity of the resulting motions is limited. For example, motions such as standing up from the ground cannot be generated with a LIP model since it neglects all kinematics constraints and assumes constant height and zero angular momentum.

Compared to this conventional approach, whole-body motion planning methods can generate more complex behaviours. Whereas QP-based OSC only considers the instantaneous effects of joint torques, whole-body motion planning strategies aim to find a sequence of torque commands by solving a finite-horizon trajectory optimization (TO) problem. Despite the appeal of this approach, the curse of dimensionality caused by the high-dimensional state space of legged robots has prevented it from being popular. Recent results (e.g.,

[9]) using Differential Dynamic Programming (DDP) [10] have shown great promise for online use. cite_minmaxSince then, many DDP advances have been proposed, demonstrating robustness for rejecting disturbances [11] and real-time performance for whole-body motion planning [12, 13, 14, 15].

Fig. 1: Mini Cheetah bounding. This paper develops coordinated advances to the Differential Dynamic Programming (DDP) algorithm for trajectory optimization in hybrid systems. In particular, the methods focus on handling the impact event, the associated switching constraints, and the inequality constraints such as torque limits.

DDP_complexityUnlike conventional direct methods, which optimize over all decision variables together, DDP adopts a divide and conquer strategy by successively solving much smaller optimization problems [10]. This feature makes DDP exceptionally suitable for problems with long time horizons because the computational effort scales linearly with time as opposed to quadratic or cubic growth with many nonlinear programming (NLP) approaches to TO. DDP_for_MPCSince DDP is a shooting method, the algorithm can also be terminated at any time while still giving a physically valid trajectory. These features and the successes of [13, 14, 15] together suggest the promise of DDP for online MPC over other direct methods.

Despite of these benefits and promise, there are some difficulties for DDP to be used in legged locomotion planning, such as dealing with the impact discontinuity and management of various constraints. The first difficulty is addressed in [9] by approximating the impact discontinuity with a smooth transition, and in [13] by ignoring the impact in DDP but compensating for this simplification with a feedback controller. These approaches either present a robustness issue or do not have experimental evidence. cite_contrl_limitedOther previous work has contributed to attacking the second difficulty by leveraging constraint-handling techniques from NLP. Box constraints on control are handled by solving QPs with a Projected Newton algorithm in [16]. A penalty method is used in [13] to satisfy state constraints. This method, however, has a numerical ill-conditioning problem that results when penalty coefficients are large. Augmented Lagrangian (AL) methods (e.g., [17]) resolve this issue by adding a linear multiplier term. Lantoine et al. [18] proposed a DDP algorithm that handles the terminal state constraints using AL, motivating their use to address the state-based switching for hybrid systems in this work.

I-B Contribution

In this paper, we propose a Hybrid Systems DDP (HS-DDP) approach that extends the applicability of DDP to hybrid systems. In particular, HS-DDP includes three algorithmic advances: an impact-aware DDP step that addresses impact discontinuities, an AL method for switching constraints, and a switching time optimization (STO) strategy. Further, in order to deal with the inequality constraints in legged locomotion, a relaxed barrier (ReB) method is adopted and is integrated within HS-DDP. The developed algorithms are benchmarked in simulation on Mini Cheetah bounding as shown in Fig. 1. The developed algorithms are extendable to general gaits such as trotting and galloping etc., and to other platforms such as bipeds and manipulators.

The structure of this paper is as follows. The DDP background and the hybrid dynamics formulation are introduced in Sections II and III. Section IV discusses the four contributions of this paper, i.e., Impact Aware DDP, AL for state-based switching, a ReB method for DDP, and a DDP-based STO algorithm. Section V analyzes the performance of the proposed algorithm in terms of constraint handling and efficiency of the STO as applied to quadruped bounding. Section VI provides a closing discussion.

Ii Background: Differential Dynamic Programming

This section gives a brief introduction to DDP following [9]. Readers are referred to [10] for detailed derivation. The goal of DDP is to find an optimal control sequence that minimizes a cost function of the form

(1)

where denotes the state trajectory, denotes the running cost, and denotes the terminal cost. The trajectory is subject to the discretized dynamics

(2)

where and respectively denote the state and control variables. DDP recursively finds by repeatedly executing a forward sweep and a backward sweep. Given a nominal control sequence, the forward sweep computes a nominal trajectory and the associated dynamics derivatives. A backward sweep is then executed to generate a policy that is used to update the control sequence. As this process continues, the control sequence (locally) converges to . citeMoritz

Since DDP optimizes only over the control sequence, it can be classified as a direct shooting method. Interested readers may refer to

[19] for a discussion of tradeoffs with other direct methods.

Denote the value function (optimal cost-to-go) at time step . Using Bellman’s principle of optimality, is given recursively backward in time:

(3)

Attempts to solve (3) directly are difficult since analytical expression of is rarely possible due to nonlinearity of . To avoid this problem, DDP considers the variation of around a nominal state-control pair under the perturbation . The resulting variation is approximated to the second order as:

(4)

where

(5a)
(5b)
(5c)
(5d)
(5e)

in which the subscripts indicate the partial derivatives and the prime indicates the next time step. Note that , and

generally are tensors. The notation ‘

’ denotes matrix-tensor multiplication. Omitting the third terms in the last three equations gives rise to the iLQR algorithm, which enables faster iterations but loses quadratic convergence properties. iLQR_gradientWe employ iLQR in this work and use the algorithm proposed in [20, 21] to efficiently compute .

Optimizing over results in the optimal control increment around the nominal control and

(6)

where is the step direction and is the feedback gain. Substituting (6) into the equation (4) results in update equations for the quadratic model of according to

(7a)
(7b)
(7c)

where denotes the expected cost reduction.

The equations (5) and (7) are computed recursively starting at the final state, constituting the backward pass of DDP. The nominal control is then updated using the resulting control policy (6) as follows,

(8)

where is a line search parameter, and respectively are the nominal and new state-control pair. linesearchA backtracking line search method is used to select [9] and a regularization strategy as in [9] is employed, ensuring decrease of the cost in each iteration. The forward-backward process above is repeated until the algorithm converges or certain number of iterations is reached.

Iii Background: Dynamics Modeling

This section presents a hybrid system model for bounding quadrupeds. Fig. 2 shows one gait cycle of quadruped bounding with four continuous modes and a reset map between every two consecutive modes. Denote the mode sequence where represents the total number of modes. Then, denotes one gait cycle. The continuous dynamics in mode , denoted by , takes place on domain . The reset map takes place on the switching surface at the boundary of . Mathematical definitions of and are introduced later. Denote the generalized coordinates of the quadruped and

the state vector where

. The hybrid model is given as

(9)

where ‘-’ and ’+’ indicate pre- and post-transition states.

Denote the contact foot in mode and the other foot. During a flight mode, represents the foot scheduled to touch down at the end of flight. The sets and are defined for one gait cycle as in Fig. 2,

(10a)
(10b)
(10c)

where is a function measuring the vertical distance of the corresponding foot to the ground, denotes the tangent bundle of the configuration space . Although the hybrid model (9) and (10) are presented for one gait cycle for simplicity, it can be extended to multiple gait cycles.

Fig. 2: Illustration of a quadruped executing bounding gait. The gait cycle is assumed to start from the back stance for simplicity of presentation. The generalized coordinates for this 2D quadruped are .

Iii-a Continuous Dynamics

The continuous dynamics in (9) varies depending on which legs are in stance. However, these dynamics can be formulated with a unified structure as follows:

(11)

where , , , and denote the inertia matrix, Coriolis force, gravity force, selection matrix, and actuation torque, respectively. and represent the contact Jacobian and contact force associated with the contact foot . The matrix on the left side of (11) is known as the KKT matrix, since the equation (11) can be obtained via KKT conditions [12]. When the robot is in flight, and are not meaningful anymore, and the KKT matrix degenerates to the inertia matrix . The state-space representation of (11) is obtained by pre-multiplying both sides of (11) by the inverse of the KKT matrix and separating the solution for .

Iii-B Reset Maps

While the generalized coordinates remain unchanged across impact events, velocities change instantaneously at each touch down. The impact dynamics are modeled as

(12)

where denotes the coefficient of restitution. Perfect inelastic collision with is assumed in this work, which means the foot velocity immediately becomes zero at touch down. The vector denotes the impulse acting on the contact foot that is scheduled to touch town at the end of flight. Note that there is no control present in the model (12) since the actuator cannot generate impulse. By separating out, the state-space representation of the reset map at impact is where

(13)

Note that the transition from stance to flight is continuous, and, thus, when denotes a stance phase.

Iii-C Time Switched Hybrid System

We associate the pre-determined mode sequence with a switching time vector where represents the terminating time of the mode. Along any trajectory of the state-switched hybrid system, (9) and (10) are equivalent to the time-switched hybrid system:

(14)

with the enforced switching constraint:

(15)

In this work, this time-switched reformulation is considered, where variables are optimized under switching constraints.

Iv Hybrid System Differential Dynamic Programming

This section discusses three algorithmic advances for HS-DDP and presents the ReB method. An overview of HS-DDP and ReB is shown in Fig. 3. The HS-DDP takes a two-level optimization strategy. In the bottom level, the switching times are fixed and the AL algorithm is executed. This algorithm continuously calls the impact-aware DDP. Once DDP converges, the constraint violations are remeasured and added to the cost function, and another DDP call is executed. The AL algorithm terminates when all switching constraints are satisfied. The output from this loop is then utilized by the STO algorithm to update the switching times. This process repeats until the switching times are optimal. The ReB algorithm is executed whenever the AL algorithm is executed. The entire algorithm framework is presented to plan trajectories for quadruped bounding introduced in Section III.

Iv-a Whole-body Motion Planning Problem

To find an optimal trajectory, we formulate a TO problem

(16)

where and respectively denote the continuous-time running cost and the terminal cost for the mode. In whole-body TO, minimization of (16) is subject to the full-order dynamics (14) and other various constraints. A common way to solve (16) is to formulate a discrete-time optimal control problem (OCP) with integration timestep as follows

(17a)
subject to (17b)
(17c)
(17d)
(17e)
(17f)
(17g)

where

(18)

and approximates the integral of running cost over integration timestep , denotes the length of time horizon up to the mode. Equations (17c) and (17d) represent the dynamics and reset map constraints, respectively, IntegrationSchemewhere is obtained via forward integration of . This work uses a forward Euler method, but all algorithmic advances hold with other integration schemes. Equation (17e) represents the switching constraint, and inequalities (17f)-(17g) represent the torque limit, non-negative normal GRF, and friction cone constraint, respectively.

Fig. 3: Overview of the HS-DDP algorithmic framework.

Iv-B Impact-Aware Value Function Update in DDP

The impact-aware DDP extends DDP to address the impact effect and does not consider constraints (17e) - (17g). While the impact-aware DDP executes the same forward sweep as DDP, it modifies the update equations (7) for the quadratic value function model at switching surface. Suppose that , , and are known at , which can be computed from DDP. The dependency of all variables on is ignored here for simplicity. Since there is no control applied at , according to Bellman’s Principle of Optimality

(19)

Since and can be computed from the forward sweep, the variation of (19) around and is considered, i.e.,

(20)

where

(21)

in which is the Jacobian of relative to . Approximating both sides of (20) to the second order, we obtain

(22a)
(22b)
(22c)

The equations (22) establish the model update equations at the switching surface, which, together with (7), constitute the model update equations of for hybrid systems.

Iv-C Augmented Lagrangian for Switching Constraints

The impact-aware DDP solves unconstrained optimization problems. Nevertheless, it can be combined with various constraint-handling techniques from NLP for constrained optimization. In this section, we are particularly interested in the switching equality constraint (17e). Penalty methods [13] to manage this constraint add a squared term of the constraint violation to the cost function. However, a numerical ill-conditioning issue could happen as the penalty increases. An AL method is employed in this work, which, in addition to the quadratic term, adds a linear Lagrange multiplier term to the cost function, avoiding the numerical ill-conditioning.

With the AL technique, the cost function now becomes

(23)

where denotes the set of all touch down points, and denote the penalty and the Lagrange multipliers, respectively. The subscript ‘’ is the AL iteration. The AL begins with certain initial values for and , and solves the resulting TO using impact-aware DDP. The parameters and are then updated and the new TO is re-solved using the resulting optimal control as a warm start. The update equations are

(24)

where is the penalty update parameter. This process is repeated until is within the threshold . To make a distinction, one execution of the forward sweep and backward sweep of DDP is called one DDP iteration. Pseudocode for the AL algorithm is shown in Algorithm 1.

1:Given
2:Mode sequence and switching time .
3:Cost function, system dynamics and switching constraints in (17).
4:Initial control sequence (e.g., zeros).
5:Initialization
6:Penalty multiplier , Lagrange multipliers , and penalty update coefficient .
7:Initial simulation of DDP and compute .
8:while  do
9:     repeat
10:         DDP (with Impact-aware mode update)
11:     until DDP converges
12:     Compute .
13:     Update , , .
14:     Update (23) and (27).
15:end while
Algorithm 1 Pseudo code combining AL and ReB

Iv-D Relaxed Barrier Function for Inequality Constraints

We employ a relaxed barrier (ReB) method [22] to manage the inequality constraints in (17). Given any inequality constrained optimization problem below

(25)
subject to

ReB attacks (25) by successively solving the unconstrained optimization

(26)

where is a weighting parameter and

(27)

which is called a ReB function where is the relaxation parameter. smoothly extends the logarithmic barrier function over the entire real domain with a polynomial of order . In many cases, works well [22]. Consequently, when applied to a TO problem, the ReB method allows the objective function to be evaluated at infeasible trajectories, which cannot be done with a standard barrier method. Note that is updated toward zero in an outer loop. This drives the resulting trajectory towards feasibility.

With this technique, the inequality constraints (17f) - (17g) are turned into ReB functions and added to the objective function . Combing this technique with AL, the constrained TO (17) is converted into an unconstrained optimization problem which is solved using the impact-aware DDP. The AL parameters , and the ReB parameter are updated in an outer loop as shown in Algorithm 1.

Iv-E Switching time optimization based on DDP

While Algorithm 1 finds the optimal control for the OCP (17) (equivalently (16)) for fixed , the switching time optimization (STO) algorithm developed in this section updates towards its optimal value under a fixed control policy (6). This approach is different from [23] where the control sequence is fixed without feedback from the state.

The STO algorithm reformulates the OCP (16) on fixed time intervals of length one, and augments the state vector with an extra state representing the time span of each mode. Denote the time state, the scaled system state, and the augmented state. Then, Algorithm 1 can be used to find , , , , and in the backward sweep. The most recent values of and are then used to update the switching time using Newton’s method.

We first discuss how the OCP (16) is formulated on fixed time intervals and then introduce how STO is derived under this new formulation. Let and . With the change of variable , time-scaled dynamics are obtained as

(28)

with the switching constraint

(29)

The timing state has the initial condition . The cost function in OCP (16) now becomes

(30)

We can now apply the Algorithm 1 to minimize (30) under the fixed initial condition . Once Algorithm 1 converges, it implies that (1) the control sequence and trajectory are (locally) optimal and (2) the quadratic model of is a valid approximation of . The gradient and Hessian are then obtained from the quadratic model. Since only affects the dynamics at the initial condition, is updated using

(31)

where denotes the switching time line search parameter. timing_line_searchSimilar to DDP, we perform a backtracking line search to select and ensure cost reduction with (31).

1:Given
2:Mode sequence .
3:Cost function (30), scaled dynamics (28) and switching constraints (28).
4:Initialization
5:Initialize control sequence and time state .
6:Execution
7:Execute Algorithm 1 to obtain optimal control , feedback gain in (6), and .
8:Line search using .
Algorithm 2 STO algorithm

Algorithms 1 and 2 are combined to solve the OCP (17) simultaneously for optimal and as shown in Fig. 3, constituting HS-DDP. Note that the STO algorithm is executed after the AL algorithm converges, which implies that the feedforward term in equation (6) becomes zero, and, thus, the control policy is used in the line search for the timing variables and in the next forward sweep. The major difference between our method and the approach in [23] is the inclusion of this feedback term in the control law. The control policy used in this work allows (31) to make more aggressive updates, and consequently achieves faster convergence. The reason behind this is that any change in will create perturbations to the locally optimal trajectory. The effect of the change in on optimality is reduced by including the feedback term in control to account for perturbations. More details on this aspect are discussed in Sec. V-D.

V Results: Bounding with A 2D Quadruped

V-a Model and Simulation

The developed HS-DDP is tested on a 2D model of simulated MIT Mini Cheetah [24] as in Fig. 2. We consider two trajectory optimization tasks. The first task fixes the switching times and applies Algorithm 1 on quadruped bounding for five gait cycles. We compare the results with those when AL + ReB is disabled and demonstrate satisfaction of constraints (17e) - (17g) within four AL iterations. The second task applies the HS-DDP to quadruped bounding for one gait cycle and demonstrates the efficiency of the STO.

V-B Five-gait Bounding with AL and ReB

Fig. 4: Convergence of the total cost and constraint violation.

In this task, Algorithm 1 is applied to 2D quadruped bounding for five gait cycles. The robot starts in the back-stance mode and is desired to run at an average forward speed 1.0

. A constant reference configuration is assigned to each mode, which mimics the robot’s posture at the end of the mode and is selected heuristically. All desired joint velocities and vertical velocity are set to zero.

Quadratic running cost and terminal cost are used in (16),

(32)
(33)

where and are weighting matrices for state deviation and energy consumption in running cost, respectively, and is the weighting matrix for the terminal cost (of the mode). In this simulation, we have zero penalty on forward position, and relatively larger penalty on forward speed, body height and joint velocities than the other states. The integration timestep s is used, and the switching times are selected such that the flight mode (and the front-stance mode) runs for 72 and the back stance mode runs for 80 . The initial guess for Algorithm 1 is given by a heuristic controller, which implements the PD control in flight mode such that a predefined joint configuration is maintained. In stance mode, the heuristic controller constructs stance leg forces following a SLIP model and converts the Ground Reaction Force (GRF) thus obtained to joint torques. The AL and ReB parameters are initialized as , , , and , and the convergence threshold is .

V-C AL and ReB Simulation Results

When AL is active and ReB is disabled, it takes three AL iterations for the constraint violation to decrease within . The convergence of the total cost (excluding the penalty term and the Lagrangian term) and switching constraint violation is shown in Fig 4. The blue square markers and the red circle markers indicate the beginning of the corresponding AL iteration. Fig. 4 demonstrates that at least one of the total cost and the constraint violation is reduced at every DDP iteration. Further, the algorithm spends more efforts in minimizing the total cost at the beginning and switches to the constraint violation after the total cost is converged.

Fig. 5: Sequential snapshots of the generated bounding motion for Mini Cheetah. Top: Motion generated by the heuristic controller that is used to warm start AL. Middle: Motion generated by DDP (without AL) ignoring switching constraints. Bottom: Motion generated with AL enforcing switching constraints. The first two methods incorrectly regard the red lines as the ground, and, thus, dynamics is reset on this ‘virtual ground’.

Fig. 5 compares the bounding gaits that are generated by three methods: (1) A heuristic controller that is used to warm start the optimization, (2) DDP (with impact-aware value function update) that ignores switching constraints, and (3) AL that enforces switching constraints. It demonstrates that the developed AL algorithm achieves the best performance. Though the motion generated by DDP is more smooth and realistic compared to the heuristic controller, the robot still violates the switching constraints, and the error accumulates over time. This is because the first two methods do not enforce switching constraints, and, thus, the robot could not correctly recognize the ground but still reset the dynamics.

Fig. 6: GRF and joint toques for 2D Mini Cheetah bounding. Top: Normal and tangent GRF for the front leg. Bottom: Joint torques. With AL and ReB, the non-negativity of normal GRF, friction, and torque limit constraints are satisfied in four AL iterations.

Fig. 6 depicts the normal and tangent GRF for the front leg (top figure), and the torques for every joint (bottom figure) when the ReB is activated. The algorithm terminates at four AL iterations. It shows that the normal GRF is always non-negative and that the friction and joint torques are confined within their boundaries, demonstrating the effectiveness of ReB method. Similar results are observed for the back leg.

V-D One-Gait Bounding with Time Optimized

In this task, HS-DDP is applied to the generation of one bounding gait for the Mini Cheetah. Different from the previous task, where only the control is optimized, switching times are also optimized in this task. Only one gait cycle is studied here in the sense that, in many situations, the switching times found for one gait cycle can be extended to the succeeding gait cycles. The cost function, the initial control sequence, the initial switching times, the AL parameters, and the terminating conditions all remain the same in this task as in the previous one.

V-E HS-DDP Simulation Results

Fig. 7: Times spent at each mode in the one-gait-cycle bounding task.

The optimal switching times obtained via the STO algorithm in HS-DDP are shown in Fig. 7. The algorithm reduces the time of the first flight mode and the front-stance mode. Fig. 7 also compares the switching times obtained via the STO algorithm with the algorithm proposed in [23] where the feedback control is not used. Both algorithms are terminated at the iteration. With HS-DDP, the overall time spent on the entire motion is 0.2335 s, a reduction of the initial overall time, whereas only a reduction is observed with the algorithm in [23], showing that the HS-DDP is more efficient in the sense of taking larger steps.

Fig. 8 explains why the two-level optimization strategy is adopted in HS-DDP. With the scaled optimization structure (28), (29), and (30), it is reasonable to update the control using (6) and the switching times using (31) simultaneously since the gradient and Hessian information are all available in the backward sweep of DDP. If the actual cost reduction is less than zero and is close to the predicted cost reduction, then the quadratic model of value function is considered valid. The quadratic approximation, however, is more sensitive to the switching time line search parameter than the control line search parameter as shown in Fig. 8. This figure indicates that has to be as small as if (6) and (31) are executed simultaneously, at the price of much less cost reduction per iteration, and, thus, decreasing the convergence rate. Although this behavior is not observed for all iterations, it can significantly slow down the optimization if a small step size is continuously used for multiple iterations.

Fig. 8: Change in cost with respect to step size. Solid lines: actual cost reduction. Dashed lines: predicted cost reduction. Red: , . Pink: , . Blue: .

Vi Conclusions and Future Works

The proposed HS-DDP framework combines three algorithmic advances to DDP for legged locomotion. It addresses the discontinuity at impacts by incorporating an impact-aware value function update in the backward sweep. By combing AL and DDP, HS-DDP reduces either the total cost or the constraint violation in every iteration, enforcing the switching constraint as the algorithm proceeds. Further, with the developed STO algorithm, HS-DDP can efficiently find the optimal switching times alongside the optimal control. A ReB method is combined with HS-DDP to manage the inequality constraints. The five-gait-cycle bounding example shows the promise of HS-DDP in rapidly satisfying the switching constraint in just a few iterations, and demonstrates the effectiveness of ReB for enforcing inequality constraints. The one-gait-cycle bounding example compares the developed STO algorithm to the previous solutions, demonstrating that our method is more efficient due to the inclusion of the feedback control in the forward sweep.

DiscussionIntegrationThough forward Euler integration is used in this work for dynamics simulation, the developed HS-DDP is independent of the integration scheme. Implicit or higher-order methods can be used if the computation time is not the primary concern. The current implementation of HS-DDP is MATLAB based, and so future work will benchmark its computational performance with C++ and realize the developed algorithm in experiments for real-time control with the Mini Cheetah. We are also interested in comparing ReB and AL in terms of their abilities for inequality constraint management.

References

  • [1] C. D. Bellicoso, F. Jenelten, C. Gehring, and M. Hutter, “Dynamic locomotion through online nonlinear motion optimization for quadrupedal robots,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2261–2268, 2018.
  • [2] P.-B. Wieber, “Trajectory free linear model predictive control for stable walking in the presence of strong perturbations,” in IEEE-RAS Int. Conf. on Humanoid Robots, 2006, pp. 137–142.
  • [3] T. Apgar, P. Clary, K. Green, A. Fern, and J. W. Hurst, “Fast online trajectory optimization for the bipedal robot Cassie.” in Robotics: Science and Systems, 2018.
  • [4] P. M. Wensing and D. E. Orin, “High-speed humanoid running through control with a 3D-SLIP model,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2013, pp. 5134–5140.
  • [5] M. Hutter, H. Sommer, C. Gehring, M. Hoepflinger, M. Bloesch, and R. Siegwart, “Quadrupedal locomotion using hierarchical operational space control,” The International Journal of Robotics Research, vol. 33, no. 8, pp. 1047–1062, 2014.
  • [6] J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the MIT Cheetah 3 through convex model-predictive control,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2018, pp. 1–9.
  • [7]

    S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,”

    Auton. robots, vol. 40, no. 3, pp. 429–455, 2016.
  • [8] D. E. Orin, A. Goswami, and S.-H. Lee, “Centroidal dynamics of a humanoid robot,” Auton. robots, vol. 35, no. 2-3, pp. 161–176, 2013.
  • [9] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2012, pp. 4906–4913.
  • [10] D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
  • [11] J. Morimoto, G. Zeglin, and C. G. Atkeson, “Minimax differential dynamic programming: Application to a biped walking robot,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, vol. 2, 2003, pp. 1927–1932.
  • [12] R. Budhiraja, J. Carpentier, C. Mastalli, and N. Mansard, “Differential dynamic programming for multi-phase rigid contact dynamics,” in IEEE-RAS Int. Conf. on Humanoid Robots, 2018, pp. 1–9.
  • [13] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli, “An efficient optimal planning and control framework for quadrupedal locomotion,” in IEEE Int. Conf. on Robotics and Automation, 2017, pp. 93–100.
  • [14] J. Koenemann, A. Del Prete, Y. Tassa, E. Todorov, O. Stasse, M. Bennewitz, and N. Mansard, “Whole-body model-predictive control applied to the HRP-2 humanoid,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2015, pp. 3346–3351.
  • [15] M. Neunert, M. Stäuble, M. Giftthaler, C. D. Bellicoso, J. Carius, C. Gehring, M. Hutter, and J. Buchli, “Whole-body nonlinear model predictive control through contacts for quadrupeds,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1458–1465, 2018.
  • [16] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in Int. Conf. on Robotics and Automation, 2014, pp. 1168–1175.
  • [17] T. A. Howell, B. E. Jackson, and Z. Manchester, “ALTRO: a fast solver for constrained trajectory optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2019, pp. 7674–7679.
  • [18] G. Lantoine and R. P. Russell, “A hybrid differential dynamic programming algorithm for constrained optimal control problems. part 1: theory,” Journal of Optimization Theory and Applications, vol. 154, no. 2, pp. 382–417, 2012.
  • [19] M. Diehl, H. G. Bock, H. Diedam, and P.-B. Wieber, “Fast direct multiple shooting algorithms for optimal robot control,” in Fast motions in biomechanics and robotics.   Springer, 2006, pp. 65–93.
  • [20] A. Jain and G. Rodriguez, “Linearization of manipulator dynamics using spatial operators,” IEEE transactions on systems, man, and cybernetics, vol. 23, no. 1, pp. 239–248, 1993.
  • [21] J. Carpentier and N. Mansard, “Analytical derivatives of rigid body dynamics algorithms,” in Robotics: Science and Systems, 2018.
  • [22] J. Hauser and A. Saccon, “A barrier function method for the optimization of trajectory functionals with constraints,” in IEEE Conf. on Decision and Control, 2006, pp. 864–869.
  • [23] X. Xu and P. J. Antsaklis, “A dynamic programming approach for optimal control of switched systems,” in IEEE Conf. on Decision and Control, vol. 2, 2000, pp. 1822–1827.
  • [24] B. Katz, J. Di Carlo, and S. Kim, “Mini Cheetah: A platform for pushing the limits of dynamic quadruped control,” in Int. Conf. on Robotics and Automation, 2019, pp. 6295–6301.