PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

by   Kuang-Huei Lee, et al.

Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100 dramatically improving prior results achieving 40


page 4

page 6


Teaching a Robot to Walk Using Reinforcement Learning

Classical control techniques such as PID and LQR have been used effectiv...

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

In this work, we present and study a training set-up that achieves fast ...

Advanced Skills by Learning Locomotion and Local Navigation End-to-End

The common approach for local navigation on challenging environments wit...

Neural Graph Evolution: Towards Efficient Automatic Robot Design

Despite the recent successes in robotic locomotion control, the design o...

Simple random search provides a competitive approach to reinforcement learning

A common belief in model-free reinforcement learning is that methods bas...

Learning Stabilizing Control Policies for a Tensegrity Hopper with Augmented Random Search

In this paper, we consider tensegrity hopper - a novel tensegrity-based ...

ES Is More Than Just a Traditional Finite-Difference Approximator

An evolution strategy (ES) variant recently attracted significant attent...

Please sign up or login with your details

Forgot password? Click here to reset