Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

05/15/2019 ∙ by Shounak Bhattacharya, et al. ∙ indian institute of science 0

In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this problem have not seen much success. Deep-Reinforcement Learning seems to be a promising approach, after its recent success in a variety of robot platforms, and the goal of this paper is to use this approach to realize the aforementioned behaviors. With this learning framework, the robot reached a bounding speed of 2.1 m/s with a maximum Froude number of 2. Simulation results also show that use of active spine, indeed, increased the stride length, improved the cost of transport, and also reduced the natural frequency to more realistic values.



There are no comments yet.


page 1

page 2

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Speed, for legged animals, is an important aspect in nature for survival. Animals like cheetah, hounds etc, use their spine to achieve remarkably high speed gaits. Studies on these animals show that the spine movement increases the effective stride length, provides auxiliary power to the legs and helps to harness energy by storing and releasing it. From the evolutionary point of view, it is natural to consider the spine as a propulsive engine of the vertebrate body. All of these observations point to the fact that spine is crucial for locomotion.

Fig. 1: Our simulated robot, Stoch 2, demonstrating active spine motion learned through Deep reinforcement learning in comparison with the Cheetah [1] with its natural bounding.

There have been quite a few quadruped robots having passive or active spine in their structure, namely, BobCat, Sevel, MIT Cheetah, INU [2, 3, 4, 5, 6]. There is a large body of work over planer spine models with one DOF revolute [7, 8, 9], and prismatic [10] joints. Apart from the one DOF model, the spines were modeled as point masses in these works. On the other hand, robots such as Sevel [3], INU [2] contain a two DOF spine. This feature allows the spine to be longer while maintaining its orientation during bounding. However, reduced order models were used when constructing the empirical model of the robot. Moreover, in [2], the legs were assumed to be massless, which result in large reality gaps. From a practical standpoint, we cannot use the approaches mentioned above, due to the fact that the spine models cannot be reduced or approximated, and the legs have significant masses (by upto ) in the real hardware. Apart from these, inaccuracy in modeling, imperfect ground contacts play a significant role while transferring on the real hardware.

To address the above issues and to develop an accurate model of the robot and environment, researchers began to use high fidelity physics engine based simulators [11, 12, 13] to simulate the robot and environment model. Works such as [4, 14, 6, 15, 16] have pursued this approach to bridge the gap between theory and experiment. Despite these tools, the results obtained with these simulators still required a noticeable amount of manual tuning and intervention to deploy on hardware. This is not practically feasible.

Fig. 2: Figure containing the (a) the quadruped robot, Stoch 2, (b) simulated robot model in PyBullet physics engine, and (c) skeleton of Stoch 2 depicting the front, rear and the spine module.

The principal challenge to be solved in using spine is that of control, namely, determining what actions should be applied over time in order to produce efficient and natural looking gaits. It is still an open question as to what type of control strategies can enhance locomotion performance with an active spine. Spine introduces redundancy, and developing a control strategy is often challenging and time consuming. Complexities arise due to nonlinear and coupled dynamics of legged systems, and from the existing trade-off between different performance criteria such as gait speed, energy efficiency, and stability. Therefore, most of the existing quadruped robots of today have completely avoided using spine by featuring a single rigid body with four legs with individually actuated hips and/or knees.

We have previously explored Deep Reinforcement learning (D-RL) based methods to realize efficient quadrupedal walking in Stoch [17], and our goal in this paper is to investigate the efficacy of D-RL for active spine behaviors. In other words, we want to revisit the problem of spine bounding with this new approach, which has seen a lot of success in recent years [14, 17, 18, 19]. Therefore, in this work, we provide insights into how RL can be successfully applied to such problems. To the best of our knowledge, this is one of the earliest work which aims to learn complex control policy for a spined quadruped robot. We also evaluate numerous performance metrics to validate the advantage of spine. Also, we show that the final control policy produces actions that are both robust and efficient.

We have arranged the paper in the following manner. Section II describes the design of our simulated robot model and simulation framework used for the training. Section III describes the D-RL framework used to learn the control policy. Finally, in section IV, all the results obtained through various simulations are discussed thoroughly.

Ii Simulation model design

Stoch 2 is a quadrupedal robot designed and developed in-house at the Indian Institute of Science (IISc), Bangalore, India. It is the second generation robot in the Stoch series. A detailed description of the previous model can be found in [17, 20]. In this section, we first describe the overview of Stoch 2 robot and the essential hardware details, and then describe the simulation framework used for training. The scope of this work is limited to the simulation results portraying quadrupedal spine locomotion, whereas the hardware experiments are a part of our future work.

The robot is designed as three modules: two body modules and one spine module. The body modules are connected via spine, as shown in Fig. 2a. The overall size and form factor of the robot is similar to Stoch [20]. Each body module is composed of two legs. A single leg contains three degrees of freedom. Each of them corresponds to the flexor and extension movement of hip, knee and abduction movements. However, the simulation model uses only hip and knee motion while keeping the abduction locked in position. Each leg comprises of a five bar linkage mechanism, where two of the links remain actuated. This enables the leg to follow a given trajectory in a plane. The central spine is designed as a serial DOF mechanism. Each of the spine joint is actuated. Overall, the robot simulation model consists of actuated degrees of freedom, including four legs and the spine. The key specifications of the simulation model are summarized in Table I.

Parameter Value
spine length mm
total leg length mm
min./max. hip joint angle /
min./max. knee joint angle /
min./max. spine front joint angle /
min./max. spine back joint angle /
TABLE I: Table showing the details of the simulated robot in PyBullet.
Fig. 3: Detailed kinematic model for the front left leg is shown here.
Fig. 4: Figure showing the tiles of the simulated model showing the learned bound gait with active spine. The top tiles show the skeleton and the bottom tiles show the robot in PyBullet simulator.

We used Pybullet [21] simulator, built on top of Bullet3 physics engine. A three-dimensional computer-aided-design (CAD) model is developed using SolidWorks [22] to capture the kinematics and inertial properties of the robot. This model is transferred to Pybullet by using a Universal Robot Description Format [23] (URDF) exporter. In addition, actual mass of all the links, actuator force limits, joint limits and kinematic-loop constraints of the flexural joints are measured and manually updated in the URDF file for a more realistic simulation.

Iii Reinforcement Learning based Controller

In this section, we will outline the deep reinforcement learning framework used for learning spine based locomotion behaviours.

Iii-a Background

In reinforcement learning setting, the locomotion problem is formulated as a Markov Decision Process (MDP). An MDP is characterized by a tuple

where is the set of robot states referred to as state space, and

is the set of feasible actions referred to as the action space. The transition probability function

models the evolution of states based on actions, and is the scalar value at every transition step. is called the discount factor defined in the range .

In reinforcement learning, the fundamental idea is to discover a policy, denoted as , that maximizes the expected cumulative reward over time. A parameterized policy with the parameters is the probability density of given . The optimal parameters of the policy yield the maximum sum of the cumulative rewards given by


Policy gradients [24] is one of the popular methods to solve this optimization problem which takes gradient ascent steps in the direction of increasing cumulative reward. We discuss more details about the algorithm used in Section III-D.

Iii-B State and Action Space

Iii-B1 State Space

Similar to the work in [17], the state is represented by angles, velocities, torques of the active joints (legs and spine), and body orientation (in quaternions). The combined representation yields a 34-D state space.

Iii-B2 Action Space

The action space consists of the continuous-valued control signal for each active joint. For each leg, the agent learns the legs’ end-point positions in polar coordinates represented as where . This particular choice of action space ensures that the five-bar leg mechanism never encounters a singularity. We use a custom inverse kinematics solver to compute the joint angles from polar coordinates. As seen from Fig 3, the five bar linkage is divided into two 2R Serial linkage and solved for each branch. The details of the equations for a 2R Serial linkage can be found here [20]. Safety limits are also included in the inverse kinematic solver to avoid singular positions. However, the agent directly learns the joint angle for the spine motors. The two motors in the spine are coupled with the relation where represents the joint angle. During the bound gait, we learn separate end-point trajectories for front and rear leg pairs. Note that both the legs in front and back module executes the same end-point trajectory during bound. The polar coordinates for the four legs and joint angle for the spine collectively provide a dimensional action space. The polar coordinates and the spine angle are restricted to a bounding box, thereby indirectly imposing angle limits on the joint angles.

Iii-C Reward Function

We designed a reward function that gives more positive reinforcement as the robot’s base speed attains a desired velocity, and simultaneously penalizes high energy consumption. The agent receives a scalar reward after each action step according to the reward function


Here is the desired velocity along the -axis, and is manually adjusted for various values of . is the energy spent by the actuators for the current step, and , are the weights corresponding to each term ( and respectively in our experiments). is computed as


where are the motor torques, and are the motor velocities of the motor respectively.

Iii-D Network and Learning Algorithm

We are employing Proximal Policy Optimization (PPO) [25] for learning optimal polices that yield action values in continuous space [17], [14]

. The actor and critic network in the learning algorithm consists of two fully connected layers with the first and second layers consisting of 128 and 64 nodes respectively. Activation units in the actor and critic networks are ReLU

ReLU , and ReLU ReLU

linear respectively. We used the open source implementation of PPO by Tensorflow Agents

[26] that creates the symbolic representation of the computation graph. The implementation is highly parallelized and performs full-batch gradient ascent updates, using Adam [27] optimizer, on the batch of data collected from multiple environment instances. The pybullet simulator is configured to spawn multiple agents for faster collection of samples and in our case 30 agents were used in parallel threads. The proposed approach yields walking gaits that are efficient, fast and also robust in simulation. The learning algorithm runs for a maximum of million steps and the observed training time is hours on a Intel Core i7 @3.5Ghz cores and 32 GB RAM machine.

Having obtained an optimal policy for bounding, we will now discuss the simulation results and also justify how the use of active spine is beneficial to locomotion.

Iv Results

In this section, we will provide simulation results and also make comparisons between spine and rigid models. We trained multiple bounding gaits with target speeds ranging from m/s and m/s for both the models. The maximum speed obtained was m/s with an active spine, and m/s without spine. Figure 4 shows the gait tiles for the target speed m/s. Froude number111Froude number is the ratio of square of the maximum velocity over the gravity and leg length, i.e., , is the velocity, is the leg length. observed was at body lengths per second. Figure 5 shows the end point trajectories and spine angles for the same gait. We will analyze the gaits obtained based on traits like cost of transport, stride lengths, power, torque profiles etc. Video results are provided in the following link: https://youtu.be/INp4aa-8z2E.

Fig. 5: Joint angle of spine and robot’s end point trajectory are shown here.
Fig. 6: Forward velocity and the vertical displacement of the center of mass data obtained for bound gait are shown here. The CoT for the rigid model is almost two times higher for the same speed.

Iv-a Cost of Transport (CoT)

Fig. 7: Cost of transport (CoT) vs. forward velocity for the bound gait is shown here. The CoT for the rigid model is almost two times higher than that for the spine model.
Fig. 8: Comparing stride length between an active spine and a rigid spine bound, with a forward velocity of m/s. The spine bound shows a much higher stride length.

In our experiments, we use Cost of Transport (CoT) as a metric of measurement for the gait performance. CoT is nothing but the mechanical work normalized by the weight and the distance traveled. To determine CoT, we take the integral of the positive work done by the actuators [7]:

where is the torque of the motors. is speed of the motor. is the average distance covered by the robot along X-axis. is the mass of the robot. is the gravity.

We pick trials and then average the CoT computed over these trials for consistency. Figure 6 shows the velocity profiles and center of mass height of the spine and rigid models. Fig. 7 shows the CoT comparison between the rigid and spine models. It can be verified that CoT of the spine model is always less than that of the rigid model for the velocities shown. Moreover, the CoT for rigid models is almost twice as much as that for the spine models. This observation is concurrent with the results obtained in [28, 9, 7, 8].

Iv-B Stride Length Comparison

Stride length provides a measure of how far the robot has walked during each step. It is the distance between two successive placements of the same foot. To measure the stride length, heel to heel distance is determined. It can be verified from Figure 8 that there is a increase in stride length due to active spine. Moreover, spine helps in reducing the bounding frequency [9], thereby allowing the gaits to be more realistic.

Iv-C Torque, Power Profile and Gait Diagram

Fig. 9: Figure showing the torque profile for active and rigid spine over a complete gait cycle.

In figure 9, we compare the torque profile for one gait cycle of the bound gait with both active and rigid models, which are at nearly the same speeds. It can be observed that the peak torque values have depreciated for the active spine by nearly . Apart from this, it can be verified that the torque profile correlates well with the stance and swing phase of the robot. The average torque experienced by the front leg motors are higher, while the impulses experienced by the back legs were higher.

V Conclusion

We showed that spine is a key driving factor for the improvement of speed and COT in quadrupeds. Unlike existing approaches, we used Deep-Reinforcement Learning to realize spine behaviors. It is worth noting that the training was from scratch, and did not require prior understanding of the mechanisms involved in using the spine. We have made comparisons with rigid models, and showed that spine improves cost of transport, power, torque, and stride lengths significantly. Future work will involve implementation of these policies on hardware.