C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

11/07/2022
by   Alexis Jacq, et al.
0

Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.

READ FULL TEXT

page 2

page 5

page 16

research
05/23/2022

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

The goal of Unsupervised Reinforcement Learning (URL) is to find a rewar...
research
06/23/2022

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

Learning a diverse set of skills by interacting with an environment with...
research
05/21/2020

LEAF: Latent Exploration Along the Frontier

Self-supervised goal proposal and reaching is a key component for explor...
research
09/22/2020

Learning Task-Agnostic Action Spaces for Movement Optimization

We propose a novel method for exploring the dynamics of physically based...
research
05/21/2020

Dynamics-Aware Latent Space Reachability for Exploration in Temporally-Extended Tasks

Self-supervised goal proposal and reaching is a key component of efficie...
research
08/15/2019

Mapping State Space using Landmarks for Universal Goal Reaching

An agent that has well understood the environment should be able to appl...
research
10/13/2017

Unsupervised Real-Time Control through Variational Empowerment

We introduce a methodology for efficiently computing a lower bound to em...

Please sign up or login with your details

Forgot password? Click here to reset