Beyond Basins of Attraction: Evaluating Robustness of Natural Dynamics

06/21/2018
by   Steve Heim, et al.
Max Planck Society
0

It is commonly accepted that properly designing a system to exhibit favorable natural dynamics can greatly simplify designing or learning the control policy. It is however still unclear what constitutes favorable natural dynamics, and how to quantify its effect. Most studies of simple walking and running models have focused on the basins of attraction of passive limit-cycles, and the notion of self-stability. We emphasize instead the importance of stepping beyond basins of attraction. We show an approach based on viability theory to quantify robustness, valid for the family of all robust control policies. This allows us to evaluate the robustness inherent to the natural dynamics before designing the control policy or specifying a control objective. We illustrate this approach on simple spring mass models of running and show previously unexplored advantages to using a nonlinear leg stiffness. We believe designing robots with robust natural dynamics is partic- ularly important for enabling learning control policies directly in hardware.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 7

11/16/2020

Enforcing robust control guarantees within neural network policies

When designing controllers for safety-critical systems, practitioners of...
03/29/2019

Mesh-based Tools to Analyze Deep Reinforcement Learning Policies for Underactuated Biped Locomotion

In this paper, we present a mesh-based approach to analyze stability and...
10/03/2019

Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion using Reinforcement Learning

This paper presents a novel model-free reinforcement learning (RL) frame...
09/26/2021

Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging Terrains

In this work, we demonstrate robust walking in the bipedal robot Digit o...
03/27/2021

Minimum directed information: A design principle for compliant robots

A robot's dynamics – especially the degree and location of compliance – ...
09/27/2019

A Momentum-Based Foot Placement Strategy for Stable Postural Control of Robotic Spring-Mass Running with Point Feet

A long-standing argument in model-based control of locomotion is about t...
03/18/2019

A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability

The goal of this paper is to understand the impact of learning on contro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Animals are not only agile and efficient, but also remarkably adaptable and robust [daley2006running, daley2010two], with arguably simple control and morphology [ijspeert2001connectionist, proctor2010reflexes, owaki2013simple]. Reproducing this performance in legged robots has been difficult. Most robots use sophisticated algorithms [westervelt2007feedback, koolen2016design, ponton2016convex, zhao2017robust]

which rely on accurate models and state-estimation at a substantial computational cost. This reliance tends to make model-based approaches brittle.

Recently, there have been attempts to combine these approaches with machine learning to improve robustness and adaptability

[heijmink2017learning, rai2017bayesian, grandia2018contact]; however, it is notoriously difficult to apply learning directly in hardware. We are motivated by the question ‘how should a legged robot be designed, such that it is easier to apply model-free learning directly in hardware?’. A key part of the answer is the inherent robustness of the natural dynamics of the system.

Indeed, designing a system with favorable natural dynamics can simplify the control problem [sprowitz2013towards, rezazadeh2015spring, heim2016designing, ramos2018facilitating, haldane2016robotic] and enable quick learning directly in hardware [tedrake2005learning, heim2018shaping]. It is, however, still unclear how to quantify and evaluate the effects of design choices on the control problem, especially in terms of robustness and ease of designing or learning the control policy. After a robot is deployed successfully, it is difficult to distinguish what is due to the mechanical design, controller design, implementation, or other factors. Designers must instead rely on experience and intuition.

Many studies of natural dynamics focus on the concept of self-stability and the basins of attraction of passively stable limit-cycles [ringrose1997self, schwab2001basin, geyer2002natural, rummel2008stable] or open-loop stable limit-cycles [wu20133]. In this study, we advocate the importance of stepping away from thinking in terms of limit-cycles and their basins of attraction. We present a formulation grounded in viability theory which allows us to quantify the inherent robustness of the natural dynamics, prior to specifying the control policy parametrization or control objective.

I-a Natural Dynamics and Spring Mass Models

Perhaps the clearest example of natural dynamics is Tad McGeer’s passive dynamic walker [mcgeer1990passive]: this purely mechanical system with no sensors or actuators (and hence no control) exhibits passively stable limit-cycles for downhill walking. This idea has been extended in several robots, adding a little actuation and control to allow walking on level ground [wisse2006design, bhounsule2012design] and to increase the basin of attraction of the passively stable limit-cycle. A key concept is to exploit the natural dynamics. The intuition behind this concept is that the control can be ‘lazy’: if a perturbation pushes the system out of the basin of attraction, the control should guide it back in. Once the state is inside the basin of attraction, the control can allow the system to naturally evolve to the attracting limit cycle.

Simulation studies of idealized walking models such as the rimless wheel [asano2012stability] and compass walker [kuo2002energetics] have provided more understanding of McGeer’s empirical results. These models also have passively stable limit cycles albeit with rather small basins of attraction.

For running, we turn to a different idealized model, the spring-mass model. This simple model was initially developed by the biomechanics community to study running [blickhan1989spring], where the spring abstracts the natural compliance of the muscle-tendon system in the leg. While the effective leg stiffness depends on many factors including muscle activation, it is modeled as a constant parameter, and thus the model has no control inputs. Thus, at the level of abstraction of the model, the natural dynamics seem passive even though the system may have active control embedded in it.

This simple model, also called a template, accurately predicts the overall behavior of many seemingly very different systems, called anchors [full1999templates]. Indeed, by proper parameter tuning, the spring-mass model can be used to accurately model diverse running systems, from humans [maus2015constructing] to cockroaches [jindrich2002dynamic], bipedal [rezazadeh2015spring] to hexapedal [altendorfer2001rhex] robots.

I-B Templates, Anchors, and Hierarchical Control

Spring-mass model templates are often used for understanding hierarchical control [full1999templates] since the template and anchor division offers a natural split in hierarchy. A high-level control policy can be designed based on a template in a low-dimensional space, while a low-level control policy based on the anchor is designed in the high-dimensional space. Thus, as long as the low-level controller enforces a template-like behavior on the system, the high-level controller design can be greatly simplified [holmes2006dynamics, stephens09biped, hawkes2018design].

In this hierarchical context, the term natural dynamics is always relative to the level of abstraction being considered. Indeed, to a high-level control policy, there is no distinction between which part of the system behavior is truly ‘passive’ and which has been influenced by the low-level controller111

This is equivalent to the split between agent and environment in reinforcement learning.

.

The template and anchor approach to hierarchical control has been used to develop various discrete-time high-level controllers: for example, the spring stiffness or landing angle of attack might be chosen once per step, but the continuous-time dynamics in between are left ‘passive’ [ghigliazza2005simply, arslan2012reactive, piovan2013two, cnops2015basin]. One result with this approach is that choosing an open-loop trajectory of landing angles of attack during flight can achieve deadbeat control without active control during stance [wu20133, palmer2014periodic].

While these results are impressive, they generally suffer from the curse of dimensionality: they are only tractable on the low-dimensional template models. Therefore, the high-level control relies on the overall system behaving as a simpler, lower-dimensional system. This is usually achieved through a combination of appropriate mechanical design, and a low-level controller that exposes a simpler dynamical behavior to the high-level controller.

There are two common approaches to low-level controller design. On the one hand, a low-level control policy can enforce the dynamical behavior of a specific template model [hutter2010slip, sentis2007synthesis, wensing2013high, poulakakis2009spring]. While this approach offers more rigorous guarantees on the behavior of the high-level system, it is also generally more difficult to implement in practice.

On the other hand, the low-level control policy can be designed to produce a lower-dimensional behavior without enforcing the specific template dynamics [pratt2001virtual, altendorfer2004stability, renjewski2015exciting, martin2017experimental]. This approach requires further tuning of the high-level control policy, since it explicitly allows for a mismatch between the high-level model and the actual system behavior.

Robustness is a key indicator of how accurate a model needs to be, regardless of the approach taken: a policy that is robust will suffer less from model inaccuracies. Our main contribution is a means to quantify the robustness of the natural dynamics, prior to designing the high-level control policy, or even specifying its objective. We first illustrate the quantification on template models in a rigorous manner. We then show an example application using gradient-free optimization to find robust parameters of a low-level controller, without enforcing a specific template model. We are thus able to quantify robustness without relying on low-dimensional template models.

I-C Computation of Viability

Our quantification relies on the concept of viability: a state is said to be viable if there exists a set of control actions that keeps it inside the viability kernel for all time [aubin2009viability]. In other words, a state that starts outside the viability kernel will fail within a finite time, regardless of the control actions applied.

There has been much interest recently in computing viable sets and its dual, back-reachable sets [maidens2013lagrangian], for safe control verification and design [liniger2017real, panagou2009viability, wieber2008viability, zaytsev2018boundaries], and more recently safe learning of control [lakatos2017eigenmodes, smitsafe]. Our contribution complements prior work by using a viability formulation to quantify robustness of the system design prior to control policy design.

Viability-based approaches share a common challenge: computing viability kernels relies on gridding the search space, making the general case intractable [liniger2017real, bansal2017hamilton].

For particular classes of systems, more efficient algorithms have been developed to find either inner or outer approximations of viable sets, which can generally be scaled to 6-10 dimensions [bansal2017hamilton]. Thus, it is often beneficial to use approximations that fit these classes and dimension restrictions. Computation of viable sets is then performed on the low-dimensional approximation, which can be tracked using a hierarchical control strategy [koolen2012capturability, fridovich2018planning].

This matches well with the existing template and anchor paradigm commonly used in legged robotics. We will show an example application, in which we optimize the parameters of the low-level control policy to exhibit robust natural dynamics to a high-level control policy.

I-D Notes on Terminology

We use terminology common to the reinforcement learning community, such as actions instead of control inputs and control policies instead of controllers. We will speak of control policies sampling an action, or the system sampling a state-action pair, to indicate the policy can be stochastic.

Much of the mathematics in the paper revolves around sets in different spaces. Capital letters such as denote spaces (in this case state space). Capital letters with a subscript such as denote a set in the corresponding space, the meaning of the subscript being explained in the text (in this case the set of failure states).

I-E Structure

In Section II we cover the details of the two spring-mass models we examine, their dynamics, and a typical bifurcation diagram for the SLIP model.

In Section III we compute the viability kernel as well as the transition map in state-action space. We illustrate how this encompasses the bifurcation diagram, and why bifurcation diagrams are limiting once we introduce control.

In Section IV we introduce our definitions of robustness, and how to use this to evaluate two different designs of leg compliance prior to designing a control policy.

In Section V we show an example application, in which the quantification developed is used as the fitness function to perform gradient-free optimization of a simulated planar monopedal robot.

In Section VI we summarize the key contributions of the paper, open questions, and our outlook.

Ii Spring-Mass Models

We use two well-studied spring-mass models to illustrate our concepts: the spring-loaded inverted pendulum (SLIP) model and a nonlinear spring mass (NSLIP) model as first studied by Rummel and Seyfarth [rummel2008stable] (see Fig. 1). Both models have hybrid dynamics with the governing equations of motion switching between flight and stance phases.

During flight phase, the body follows a ballistic trajectory, whereas during the stance phase it follows a spring-mass motion, which depends on the modeled spring. The details of the equations of motion have been derived in [blickhan1989spring, rummel2008stable], and can be found in the appendix. For convenient comparison, we use the same parameters as in [rummel2008stable], which are similar to human averages. In this work, we consider only deterministic dynamics.

Figure 1: We focus on two spring-mass models: a) the spring-loaded inverted (SLIP) model with a linear prismatic spring, and b) a segmented leg model, with a linear torsional spring, which we will refer to as a nonlinear spring-mass (NSLIP) model. c) shows a qualitative trajectory over one cycle, starting and terminating with a flight apex event.

Ii-a Discrete Analysis via Poincaré Sections

The continuous motion of the point-mass body is fully described in planar Cartesian coordinates by the state vector

. We simplify analysis by only evaluating the state on a Poincaré section at flight apex, a common approach for cyclic motion. At flight apex, potential and kinetic energy are conveniently contained in the vertical position and forward velocity respectively. Thus, the continuous state vector of can be reduced to . Taking advantage of the constant energy constraint, we can further reduce the system to a single state, the normalized apex height , which defines our state space:

where and are potential and kinetic energy, respectively, and is the gravitational constant.

Starting from any state at apex , we can numerically integrate the continuous time dynamics until the system either transitions to a second apex height or to a failure state. We thus obtain the Poincaré map, also called a transition map, for our discrete dynamics:

where the landing angle of attack is a model parameter of interest. We use this as our control action in Section III.

We will consider as failures all states in which the body hits the ground with , as well as when the system reverses direction with . More formally,

Ii-B Bifurcation Diagram of the SLIP Model

A bifurcation diagram allows the study of the existence and stability of fix-points and limit-cycles, as a dependence of model parameters.

The bifurcation diagram of the SLIP model with respect to the angle of attack is shown in Fig. 2. Similar bifurcation diagrams for spring-mass models can be found in [merker2015stable], and bifurcation diagrams for spring stiffness can be found in [ghigliazza2005simply, rummel2008stable].

Figure 2: The bifurcation diagram of the passive SLIP model highlights the small range of parameters for which stable limit-cycles exist. The basins of attraction are bounded by infeasibility and unstable limit-cycles. Beyond these basins of attraction, however, is a lot of structure that can be exploited through control.

We only evaluate period-1 limit-cycles, that is when , and do not consider orbits which require multiple iterations to return to periodicity. Stable limit-cycles are marked with a solid red line and unstable limit-cycles with a dashed red line. The basins of attraction of the stable limit-cycles are highlighted by the shaded area.

These basins of attraction are bounded from below by an infeasibility constraint: below this line, the foot would begin underground. The unstable limit-cycles bound the basins of attraction from above: being perturbed onto an unstable limit-cycle will keep the system at that new state; beyond this threshold, it will diverge until the system fails.

Since either infeasibility or unstable limit-cycles bound the basins of attraction, many previous studies have been limited to identifying these bounds. The relevant range of parameters and states for studying basins of attraction tends to be narrow, as illustrated in Fig. 2. We will show in the next section that there is a lot of structure outside the basins of attraction of these passively stable limit-cycles. Once we allow parameters such as the angle of attack to be actively chosen as a control decision, the relevant bounds are no longer the bounds of the basins of attraction, but those of failure and viability.

Iii Natural Dynamics and Viable Control

We begin the section by introducing control, then evaluate the effect the natural dynamics have on the set of possible control policies. A key concept is the link between the viability kernel, a set within the state space, and the set of viable state-action pairs.

Iii-a Control Policies and State-Action Space

We will now allow the system to choose the landing angle of attack freely at each flight apex. This defines our action space :

where is any action in . In our figures we only show the relevant range, excluding the range which contains only failures or infeasible state-action pairs.

A control policy is any function that maps a state to an action, . As such, a policy lives in the combined state and action spaces, which we term 222This term is chosen in reference to Q-learning in reinforcement learning..

Iii-B Transition Map

We compute high-resolution 800 by 800 grids of state-action pairs, as is commonly done for these types of problems [wu20133, piovan2015reachability, cnops2015basin, zhao2017robust, zaytsev2018boundaries]. We thus obtain a lookup table of the transition map , visualized in the state-action space in Fig. 3 for the SLIP model and in Fig. 4 for the NSLIP model.

Figure 3: The lookup table of the SLIP model’s transition map shows possible combinations of state (height at apex) and action (landing angle of attack), and their transition to either a second apex or a failure. State-action pairs in the gray region result in failure. State-actions in the warm and cool colored regions result in hopping higher and lower respectively, with the color indicating the change in state (vertical axis) at the next apex. Also marked are passively stable (solid red) and unstable (dashed red) limit-cycles, where the state does not change.

To highlight the limit-cycles, we use a color-map centered around . The warm and cool colored regions correspond to state-action pairs that result in a higher or lower state, respectively. The gray regions are state-action pairs which result in a failure state . The black region is composed of infeasible points in which the foot would start underground, and as such is not part of the Q-space.

We call the gray region the set of failing state-action pairs . Its complement, the colored region, is the non-failing set of state-action pairs . More formally,

(1)

We denote the projection of onto the state space as the set . Throughout the paper, we always use orthogonal projections, that is,

(2)

is the set of controllable states, from which actions that avoid immediate failure can be selected. More formally,

The upper bound between and are state-action pairs that convert all kinetic energy into potential energy in one step, resulting in a state of . In other words, these are the equivalent of 1-step capture points [koolen2012capturability]. The lower bound is a boundary to falling, meaning that the point-mass hits the ground without reaching a second flight apex.

Iii-C Viable Sets

A viability kernel is the set of all states for which there is at least one time-evolution of the system which remains in the set for all time [aubin2009viability]. Since all state-action pairs result in at least a second step, all have at least a one failure-preventing action available. However, it is possible for a non-failing state-action pair to reach a state from which all solutions eventually reach a failed state, as was examined in [heim2018learning]. In other words, there can be states from which immediate failure can be avoided, but from which the system will fail within some finite time. Thus, the viability kernel, which we will call , is a subset of and the set of viable state-action pairs is a subset of .

We can compute the discretized set of viable state-action pairs and its projection iteratively, as in Algorithm 1. In this process, we begin with an estimated and . Then we alternate trimming both estimates of and : first, we check if any state action pairs in the estimated maps to a state outside of and exclude these from . Then we update the estimate of as the projection of the new estimate and repeat. If the projection does not change, each state in has an action available that maps back into itself and the algorithm terminates.

procedure Viable Sets()
     
     
     while  do
         
         for all  do
              if  then
                  Remove from                             
     return
Algorithm 1 Compute Viable Sets

For the models we examine, is equal or almost equal to except in unusual corner cases.

We can now compare the resulting and for the SLIP and the NSLIP models (Fig. 5).

Figure 4: Although the viability kernel remains the same for both models, the size of of the NSLIP is 36% larger. This allows for more flexibility and robustness in designing a control policy for the NSLIP model. For reference, the of the SLIP model with gray lines in horizontal and vertical for the cold and warm colored regions respectively.

Although the set of viable states is the same in both models, the set of viable state-action pairs is much larger for the NSLIP model. This suggests unexplored benefits of nonlinear leg compliance.

Iii-D Family of Viable Control Policies

A control policy must sample from

with non-zero probability; otherwise, it will always fail in a single step. All meaningful policies must sample from

with non-zero probability, or it will always fail in finite time. In order to avoid failure from every viable state for all time, a policy must sample exclusively from , which we call the viable policy design space. We call the set of all such policies the family of viable control policies. More formally, if the set is non-empty, we also have a non-empty set of viable policies , where

The shape of in the dimensions of and poses different constraints on the control policies that we can design. The projection of onto the dimensions of state space is the viability kernel itself.

The volume of in the dimensions of action space , on the other hand, allows more flexibility in designing a viable control policy since more viable actions are available to choose from.

Imagine for example a set defined by a single line333A hypersurface for arbitrary dimensional state-action space. covering all of , a surjective function . While the viability kernel is maximal, there is exactly one deterministic control policy which remains viable. This can make the control policy not only difficult to design or learn, but also very sensitive to uncertainty, as we will discuss in the next section.

Iv Robust Natural Dynamics

We define robustness as the ability of a system to avoid failure in the face of uncertainty. A key objective of this work is to evaluate the robustness inherent to the natural dynamics: we care about the robustness resulting from the system design, before specifying the policy parameterization or even the control objective (such as converging to a specific limit cycle).

To this end, we focus on uncertainty in action-space, in other words, the effect of noise on the control policy output. We will use this as a basis to also examine robustness to perturbations in state-space for the family of all robust controllers. We briefly discuss the link of action noise to state-estimation noise. We do not consider model uncertainty, and leave this to future work.

Iv-a Computing Robust Sets

Noise in the action space causes the system to sample a state-action pair with a different action than chosen by the policy:

(3)
(4)

where is some form of noise. A robust control policy needs to ensure that the chosen output never causes the system to fail despite this noise, for all time. More formally,

(5)

where is the family of all robust control policies. For simplicity, we will consider noise sampled from a symmetrical bounded set , where is some finite scalar.

When considering unbounded noise (such as Gaussian noise), similar arguments hold in a probabilistic sense: instead of being able to guarantee that state-action pairs allow the system to never fail, we can only guarantee that it will not fail within a finite-time horizon with a certain probability.

Figure 5: Robust sets for different amounts of noise are computed for the SLIP (top) and NSLIP (bottom). The NSLIP benefits from much larger robust sets for any amount of noise, which makes it easier to design or learn a robust control policy. Also, the set of robust states are not only larger for the NSLIP, but remain relatively large even for rather imprecise control.

The effect of action noise reduces the space available for controller design in two ways. First, the output of the control policy must be sufficiently distant from failing state-action pairs, such that the added noise never causes an immediate failure. The second requirement is similar to that for viability: the system must always land in a state from which it can continue to sample robustly, for all time. More formally, we want that

(6)

We call the robust control policy design set. Similar to the relation between and , policies in the set must sample exclusively from in order to avoid failure for any state where . Such sets are shown in Fig. 5 for various amounts of noise . Each of these sets is computed with the iterative process in Algorithm 2. This is essentially the same as the algorithm for computing the viable set, while also considering additional possible transitions caused by noise. Note that, if the system dynamics have certain properties, only the worst-case noise needs to be considered [bansal2017hamilton]. Even without these properties, a worst-case only assumption is often sufficiently accurate in practice.

procedure Robust Sets()
     
     
     while  do
         
         for all  do
              for all  do
                  if  then
                       Remove from
                       Break                   
                  if  then
                       Remove from
                       Break                                               
     return
Algorithm 2 Compute Robust Sets

Importantly, the computation of depends only on the set and thus the set of failure state , the transition map and the noise set . It does not depend on the exact choice of policy , but is valid for the family of all robust control policies . In other words, we can evaluate the robustness inherent to the natural dynamics, before we design the control policy or define a control objective other than ‘avoid failure’.

Iv-B Evaluating Robustness of Different Legs

We compare the robustness of the SLIP and NSLIP models for varying amounts of noise, as shown in Fig. 5.

With the SLIP model, and become empty sets for noise greater than , whereas in the NSLIP model the upper threshold is almost twice as large, at .

For any given amount of noise, the size of the set is also much greater for the NSLIP than for the regular SLIP model. The larger size of means there is more flexibility to fulfill robustness requirements while also designing a control policy around other criteria.

Furthermore, action noise is one of the most common methods of introducing exploration in learning, for example with Gaussian policies [sutton1998reinforcement, kober2013reinforcement]. The amount of noise needs to be carefully balanced: more noise allows for more aggressive exploration, but it can also keep the agent from converging to the true optimum, as well as lead to unstable behaviors ending in failed states. This can be particularly troublesome for learning in hardware, requiring more samples as well as potentially damaging the robot. Robustness to action uncertainty allows for more aggressive and effective exploration during learning. This is particularly important for applying model-free learning directly in hardware.

Iv-C Robustness to State Perturbations

The projection of the robust policy design set onto state-space, , is the set of robust states, from which any robust policy can always recover. Interestingly, with small amounts of noise up to , remains the same for both the SLIP and NSLIP models (see Fig. 6). For greater amounts of noise, it shrinks much more rapidly for the SLIP model.

Figure 6: The size of the sets of robust sets remains equal for the SLIP and NSLIP models for noise bounded to less than . For greater amounts of noise, the sets shrink much more rapidly for the SLIP model.

The set is particularly useful for choosing the specific control objective. For example, if we expect perturbations in state-space to have a symmetrical distribution, we would want to stabilize a limit-cycle near the center of . On the other hand, if we expect a specific type of perturbation to occur more frequently, we can choose a limit cycle with a larger margin in that specific direction.

As a specific example, a well-studied state perturbation is a change of ground height between steps [daley2006running, sprowitz2013towards, wu20133]. This type of perturbation involves a change in total energy: the forward velocity at apex remains the same, though the effective height (and thus potential energy) changes. We can compute at different energy levels to then pick out operating points that remain robustly controllable across different energy levels, as shown in Fig. 7. Assuming symmetric distribution of perturbations, the control objectives should be chosen to maximize the distance from the edge of the viability kernel in each direction. For a given desired forward velocity, we can thus choose a total energy that centers the normalized height to perturbation along the vertical axis (constant energy perturbation) and along the forward velocity isolines (ground height change).

Figure 7: We show here the for different amounts of total energy for the NSLIP model, with noise fixed at . For a change in ground height, the system state travels along the forward velocity isolines (dashed black). For reference, the author runs recreationally at roughly , Eliud Kipchoge ran the Breaking2 marathon event at roughly and Usain Bolt holds the 100 meter dash world record at roughly . The simulations shown in other graphs are all for the fixed energy level of 1’860 Joules.

Iv-D Robustness to State Estimation Uncertainty

Sensory noise causes the control policy to sample an action based on a noisy estimate of the state:

(7)

where is the noise in state space. There is an equivalence between and : the action used deviates from what a control policy would determine under perfect conditions, whether this is due to noise in action space or state estimation. This equivalence can be directly calculated using eq. 4 and eq. 7:

If the control policy is affine, the equivalence is trivially and for bounded estimation noise the equivalent action noise is also bounded. Otherwise, we cannot guarantee bounds are available. Since this equivalence is dependent on the specific control policy, we do not investigate it further here. Suffice it to say, increasing robustness to action uncertainty can only improve robustness to state-estimation uncertainty as well.

Iv-E Model Comparison

Previous studies of spring-mass models by Rummel and Seyfarth and others [rummel2008stable, owaki2006enhancing, karssen2011running] have suggested that nonlinear effective leg compliance can improve stability. These studies focus on finding basins of attraction with a fixed parameter set. As such, they focus specifically on limit-cycle motion and only provide insight to robustness to state perturbations.

With their numerical studies, Rummel et al. show that, compared to a linear leg compliance, a nonlinear leg compliance has a broader range of parameters which exhibit passively stable limit-cycles. These limit-cycles also tend to have larger basins of attraction. However, at higher velocities, the model with nonlinear spring stiffness no longer exhibits passively stable limit cycles, whereas with a linear spring this property is retained. These results suggested that nonlinear compliance is only beneficial at lower running speeds [rummel2008stable].

Using our formulation, we can evaluate robustness to state perturbations not only for an open-loop system but for any robust control policy. Our results confirm that, even with a maximally robust control policy, the set of robust states shrinks at higher speeds (see Fig. 7), though not as drastically as the basins of attraction studied by Rummel et al.

V Optimizing Natural Dynamics for Robustness

As an example application, we use our quantification to optimize the robustness of a simulated planar monoped with a 2-segment leg with a hierarchical control structure, shown in Fig. 8. The kinematic tree of the simulated system matches a robot testbed we currently use in our lab, though we have adjusted the parameters to be consistent with the models in the previous section. The system consists of three links: a floating-base free to move in the plane, but without rotation, and a two-link leg. Both hip and knee joints are actuated, resulting in an 8-dimensional state space and a 2-dimensional action space. Rigid impacts and ground-reaction forces are solved as described in [remy2011matlab, hutter2010slip].

Figure 8:

The simulated system is based on a hardware testbed, which is rigidly attached to a boom. Thus the floating base is limited to two degrees of freedom. Two additional degrees of freedom, the hip and knee joints, are both actuated. Thus the system has 4 position coordinates

, an 8 dimensional state space and a 2 dimensional action space , where and are the hip and knee torques, respectively. The robot shown is designed by our colleague Felix Grimminger.

We use the volume of the robust set

as the fitness function for a particle swarm optimization (PSO), a standard gradient-free optimization scheme. Thus, instead of requiring the low-level controller to enforce a specific template model, we improve its robustness in a general sense. The resulting natural dynamics allow for a high-level control policy to be implemented more reliably.

V-a High-Level State-Action Space

The choice of the high-level state-action space is based on the spring-mass models and classic Raibert control [raibert1986legged], which share many similarities. The structure is shown in Fig. 9.

Figure 9: The high-level state-action space is composed of the height and forward velocity of the floating base at apex , the desired landing angle of attack and the thrust factor . The natural dynamics considered are those relative to the high-level. These include both the rigid-body dynamics of the simulated robot as well as the embedded low-level controller.

The state is defined on the Poincaré section at flight apex, as introduced in Section II. Since the system is not energy-conservative, both the height and forward velocity of the floating base at apex must be considered, resulting in the state vector .

The action space is defined as a desired landing angle of attack , constrained within and , and a thrust factor applied during stance, constrained within and . This results in a 4-dimensional state-action space in the high-level, which is amenable to direct computation of a sufficiently dense grid.

Although our choice of the state-action space is largely motivated by Raibert control, we make no restrictions on the high-level control policy and do not decouple the states and actions.

V-B Low-Level Controller

The low-level controller is a state-machine that switches between flight and stance.

During flight, a standard PD position controller tracks the desired landing angle of attack dictated by the high-level control policy. The resting length of the virtual leg, , is set as a constant parameter less than the maximum leg length to avoid reaching singularities. Thus uniquely determines the desired foot position during flight. Since there are two possible joint configurations for each desired foot position, this orientation is also set as a constant parameter in the computation of the inverse kinematics. Thus also uniquely determines the desired joint angles. During the first flight phase, from apex till touchdown, is freely chosen as the action. For the second flight phase, from liftoff till the next apex, is reset to the default position . Thus the initial leg configuration at each apex is expected to be the same.

During stance, we do not enforce the dynamics of a spring mass template model. Instead, compliant behavior is achieved via virtual model control (VMC) [pratt2001virtual, renjewski2015exciting]. Torques are computed to mimic a relatively arbitrary leg compliance:

(8)

where are the hip and knee torques, is the actuator selection matrix, is the contact Jacobian, is the stiffness coefficient of a virtual linear spring between hip and foot, is the deflection of the virtual leg from rest, is a symmetric linear matrix, and is the joint deflection of the leg from rest. The diagonal coefficients of can be interpreted as virtual springs on the corresponding joints, while the off-diagonal coefficient serves as a mixing term. As long as is positive-definite, results in a nonlinear compliance with respect to the virtual leg deflection . In similar fashion to classic Raibert control [raibert1986legged], additional thrust is triggered once the body reverses direction by amplifying joint torques by the thrust factor , as dictated by the high-level control policy.

We assume accurate tracking of during flight phase, which is achieved through proper tuning of the PD gains. This is important to ensure well-behaved high-level dynamics for two reasons. First, to ensure that each high-level state-action pair results in a unique state at touchdown. Second, to ensure that the robot leg returns to the same resting configuration at each apex. In this manner, the leg masses can be lumped with the floating base to determine potential and kinetic energy, meaning that the high-level state fully describes the system energy. The transition map thus provides a unique map for each high-level state-action pair, and the viable sets and can be directly computed in the high-level state-action space.

V-C Optimization Setup

We use a standard PSO implementation based on [clerc2002particle]. The parameters optimized are the stiffness coefficients of the virtual leg in the low-level stance controller, , where and form the diagonal of the symmetric matrix , and is the off-diagonal term.

As fitness function, we choose to maximize the hypervolume enclosed by the viable set in the high-level state-action space. For our systems, we have found that maximizing the hypervolume of and

generally leads to the same results for reasonable amounts of noise. Each dimension of the state is normalized by heuristically determined bounds on maximum height and forward velocity, and the dimensions of the action space are bounded by their corresponding constraints. The hypervolume is calculated by summing and then normalizing the points inside the set. Thus, a fitness of 1 means that for any state, all actions are viable. A fitness of 0 means that for any state, all actions are outside the viable set.

For the results shown, 25 particles were initialized at random. Convergence tolerance on the fitness variance was set to

, which was reached after 12 iterations, taking roughly 3.5 hours on a 28-core desktop. During the optimization, we used a low-resolution grid with 160’000 points to speed up computation. Note that a lower resolution will result in a more conservative estimate of the sets, but not in mislabeled points in the set. The simulation parameters used are:

Mechanical Parameters
gravitational constant : 9.81
body mass : 65
upper leg length : 0.5
upper leg mass : 10
upper leg inertia : 2
lower leg length : 0.5
lower leg mass : 5
lower leg inertia : 2
Low Level Control Parameters
leg resting length : 0.85
saturation torque : 2000
Hip joint PD gains :
Knee joint PD gains :

V-D Optimization Results

We compare the robustness with a virtual leg compliance roughly matching that of the SLIP model, with stiffness coefficient , versus one with the stiffness coefficients resulting from the optimization, . The viability kernels are visualized in Fig. 10. The intensity of the color-map indicates the portion of the action space which is viable for each point in state space. The red triangle marks an arbitrary operating point, , and the action space for this state is shown in the image inset. In the action space, the action-pair leading to limit-cycle motion is also marked by a red triangle. To illustrate improved robustness, 50 actions are uniformly sampled around the operating point assuming bounded noise (orange circles) and an additional 50 with bounded noise between and (blue circles).

As in the comparison between the SLIP and NSLIP models in the previous section, the viability kernel in action space remains nearly identical for both systems. The volume of the set of viable state-action pairs, however, increases from 0.08 to 0.23, over 2.8 times. The noisy sampling of actions around the operating point shows the decreased sensitivity to action noise with the optimized nonlinear compliance.

Figure 10: Shown are the viability kernels in the high-level space for the initial monoped (left) and after optimizing virtual compliance (right), along with the action-space for the operating point , shown in the image insets. A red triangle marks the state-action pair which leads to limit-cycle motion on the operating point, in both the state and action spaces. In the action space, the orange and blue circles mark actions randomly sampled around the operating point and with bounded noise , and between and , respectively. The states reached by these state-action pairs are marked with their respective colors in the state-space, which shows the much lower sensitivity experienced by the optimized monoped. The intensity of the color-map indicates for each point in state space, the portion of the action space which is viable. In the action space (image insets), the color-map indicates the intensity of the state that would be reached if that state-action pair were sampled.

In Fig. 10 we chose an arbitrary operating point for the sake of simplicity and fair comparison. In practice, an operating point can be chosen based on the robustness of that point in state-space. Conversely, instead of optimizing the overall robustness of the system, the fitness function can be weighted to bias robustness near a predetermined operating point.

Vi Conclusion and Outlook

We have presented a formulation for computing viable and robust sets in state-action space which allows the inherent robustness of a system to be quantified, prior to specifying the control policy parameterization or objective. Different system designs can thus be compared quantitatively.

We have illustrated this formulation on the spring-mass model, a low-dimensional system commonly used to synthesize control strategies for running robots. Furthermore, we have shown an example application using our quantification to perform gradient-free optimization. The system optimized is a simulated planar monoped with a two segment leg and a hierarchical control structure. The low-level controller parameters are optimized to improve robustness of the natural dynamics, as relative to the high-level state-action space.

An important advantage of this formulation is that the natural dynamics robustness can be optimized without enforcing the dynamics of a specific template model, which is often challenging and requires extensive tuning, developing accurate models as well as state estimation [sentis2007synthesis, herzog2016momentum, flayols2017experimental]. Instead, the inherent robustness will allow control policies designed on simple model abstractions to be leveraged despite inaccuracies.

To the best of our knowledge, prior work in viability theory focuses on evaluating robustness of a specific control policy, or on synthesizing control policies directly, and computation is limited to viability kernels in state space.

The notable exception is the work of Zaytsev et al. [zaytsev2018boundaries], which also computes viable sets in state-action space. Aside from the minor difference in studying walking instead of running models, Zaytsev et al. focus on the connection between controllability and viability. This is used to qualify how robust a given control policy is, how appropriate different templates may be for a given control task and given robot, and to motivate the statement that planning two steps ahead is sufficient. While we use the same state-action space formulation, we take a different approach to quantification by evaluating bounded noise in action space, which is more suitable for our motivating question: how to design natural dynamics that are easy to exploit? Indeed, we show why this is the only type of uncertainty which can be considered for the family of all robust control policies, without setting any assumptions on the control policy structure or objective. As such, we find our methods to be highly complementary, and applicable at different stages of robot design.

One of the main challenges with viability-based approaches is tractability [liniger2017real, bansal2017hamilton]. While we have shown how, in principle, a hierarchical control scheme reduces dimensionality, this approach alone is rarely sufficient in dealing with the curse of dimensionality on real systems. There is much recent progress on different scalable approaches to computing viable and back-reachable sets (see Section I-C), and the specific choice will depend greatly on the properties of the system in question.

For running motion, characterized by nonlinear, non-smooth hybrid dynamics, we believe that, in addition to dimensionality reduction through hierarchical control, the use of heuristics such as computing ahead only two steps [zaytsev2018boundaries], are among the most promising tools to scaling this to real hardware.

We are also interested in using sampling-based approaches to make probabilistic estimates. There has been keen interest recently in applying machine learning techniques to tune control parameters directly in hardware [antonova2016sample, heijmink2017learning, kumar2018improving, vonrohr2018gait]. In these situations, safe exploration of the state-action space is particularly important. Active sampling to add samples close to the edge of the viable set would significantly increase sample-efficiency for estimating the sets, while at the same time allowing safe exploration, making this a logical next step.

There is also potential for improvement in the definition of failures, the starting point of any viability approach. In this paper, we have used a very general and intuitive definition for failure (falling and direction reversal), however other definitions may be equivalent while offering earlier detection when computing viability kernels. Conservative definitions which lead to inner approximations may also be useful if they substantially speed up computation. It may also be possible to decouple the system dynamics, a common approach to simplifying control [raibert1986legged, heim2016designing, whitman2009simple], and identify different failure conditions for each decoupled subsystem. This divide and conquer approach would also allow substantially higher dimensional systems to be tackled.

Appendix: SLIP and NSLIP models

The SLIP and NSLIP models are shown in Fig. 11. Integration between two apex events is split into three phases: a flight phase which terminates with a touchdown event, a stance phase which terminates with a liftoff event, and another flight phase which terminates with an apex event.

Figure 11: a) shows the parameters of the SLIP and NSLIP models. b) shows the states. The reference frame is reset to the foot position at each touchdown. c) shows a qualitative trajectory over a full cycle, with the relevant phases and events.

The flight phase equations of motion are

where and are the body position and is the gravitational constant. The stance phase equations of motion are

where is the incident angle between the body and the foot (the rotation by serves to keep it consistent with the landing angle of attack) and is the force acting on the body due to the spring. In the SLIP model,

where is the spring coefficient, is the spring resting length, and is the leg length. In the NSLIP model,

where is the torsional spring coefficient, is the spring resting angle and is the knee angle. The three events are

touchdown:
liftoff:
apex:

At each touchdown, the reference frame is reset to the foot position, which allows the equations of motion to be written more compactly. In the simulation, we also keep track of the foot position in an auxiliary variable.

For convenient comparison, we use the same parameters as in [rummel2008stable], which are similar to human averages:

gravitational constant : 9.81
body mass : 80
prismatic spring resting length : 1
prismatic spring coefficient : 8200
torsional spring resting angle : 170
torsional spring coefficient : 704

For the SLIP and NSLIP simulations shown, except in Fig. 7, the system energy simulated is 1’860 Joules.

Acknowledgments

We thank everyone who gave feedback during the writing of this manuscript. In particular, we appreciate the frequent and insightful discussions with Matthew Millard, Brent Gillespie and Andrea del Prete, as well as Friedrich Solowjow’s advice on mathematical notation. We also appreciate the editors and reviewers for their constructive suggestions and quick turnaround time.

Steve Heim received the B.Sc. degree in mechanical engineering and M.Sc. in robotics, systems and control from ETH Zurich, Switzerland, in 2012 and 2015, respectively. He then spent two years with the Ishiguro lab at Tohoku University in Sendai, Japan. He is currently pursuing the Ph.D. degree with the Dynamic Locomotion Group at the Max Planck Institute for Intelligent Systems in Stuttgart, Germany. His research interests include nonlinear dynamics, control and learning, particularly in relation to legged locomotion.

Alexander Spröwitz received the Diplom degree in mechatronics from Ilmenau Technical University in Germany in 2005, and the Ph.D. degree in manufacturing systems and robotics from the Biorobotics Laboratory at the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland in 2010. Since 2016 he is the Max Planck Research Group Leader of the Dynamic Locomotion Group, and IMPRS-IS faculty at the Max Planck Institute for Intelligent Systems in Stuttgart, Germany. His current research focuses on the mechanisms underlying legged locomotion. Dr. Spröwitz and his team design and experiment with legged robots and models to infer biomechanics and neuro-control principles of motion in animals.