Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

09/24/2021
by   Nikita Rudin, et al.
26

In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

page 11

research
08/16/2022

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies...
research
12/26/2018

Learning to Walk via Deep Reinforcement Learning

Deep reinforcement learning suggests the promise of fully automated lear...
research
07/27/2022

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

Evolution Strategy (ES) algorithms have shown promising results in train...
research
02/20/2020

Learning to Walk in the Real World with Minimal Human Effort

Reliable and stable locomotion has been one of the most fundamental chal...
research
12/13/2021

Teaching a Robot to Walk Using Reinforcement Learning

Classical control techniques such as PID and LQR have been used effectiv...
research
12/06/2022

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior

Learned locomotion policies can rapidly adapt to diverse environments si...
research
02/02/2022

Accelerated Quality-Diversity for Robotics through Massive Parallelism

Quality-Diversity (QD) algorithms are a well-known approach to generate ...

Please sign up or login with your details

Forgot password? Click here to reset