How does the structure embedded in learning policy affect learning quadruped locomotion?

08/29/2020
by   Kuangen Zhang, et al.
0

Reinforcement learning (RL) is a popular data-driven method that has demonstrated great success in robotics. Previous works usually focus on learning an end-to-end (direct) policy to directly output joint torques. While the direct policy seems convenient, the resultant performance may not meet our expectations. To improve its performance, more sophisticated reward functions or more structured policies can be utilized. This paper focuses on the latter because the structured policy is more intuitive and can inherit insights from previous model-based controllers. It is unsurprising that the structure, such as a better choice of the action space and constraints of motion trajectory, may benefit the training process and the final performance of the policy at the cost of generality, but the quantitative effect is still unclear. To analyze the effect of the structure quantitatively, this paper investigates three policies with different levels of structure in learning quadruped locomotion: a direct policy, a structured policy, and a highly structured policy. The structured policy is trained to learn a task-space impedance controller and the highly structured policy learns a controller tailored for trot running, which we adopt from previous work. To evaluate trained policies, we design a simulation experiment to track different desired velocities under force disturbances. Simulation results show that structured policy and highly structured policy require 1/3 and 3/4 fewer training steps than the direct policy to achieve a similar level of cumulative reward, and seem more robust and efficient than the direct policy. We highlight that the structure embedded in the policies significantly affects the overall performance of learning a complicated task when complex dynamics are involved, such as legged locomotion.

READ FULL TEXT

page 1

page 3

research
11/09/2020

Learning Task Space Actions for Bipedal Locomotion

Recent work has demonstrated the success of reinforcement learning (RL) ...
research
03/10/2021

RMP2: A Structured Composable Policy Class for Robot Learning

We consider the problem of learning motion policies for acceleration-bas...
research
08/23/2019

A Comparison of Action Spaces for Learning Manipulation Tasks

Designing reinforcement learning (RL) problems that can produce delicate...
research
12/05/2020

RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control

We present a unified model-based and data-driven approach for quadrupeda...
research
09/28/2018

Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped

Learning controllers for bipedal robots is a challenging problem, often ...
research
05/03/2023

Enhancing Efficiency of Quadrupedal Locomotion over Challenging Terrains with Extensible Feet

Recent advancements in legged locomotion research have made legged robot...
research
03/11/2021

Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning

Deep reinforcement learning has emerged as a popular and powerful way to...

Please sign up or login with your details

Forgot password? Click here to reset