EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

10/02/2022
by   Yifu Yuan, et al.
0

Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment without the guidance of extrinsic rewards to facilitate the fast adaptation of various downstream tasks. Previous works focused on the pre-training in a model-free manner while lacking the study of transition dynamics modeling that leaves a large space for the improvement of sample efficiency in downstream tasks. To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. However, constructing a generalizable model which captures the local dynamics under different behaviors remains a challenging problem. We introduce the multi-choice dynamics model that covers different local dynamics under different behaviors concurrently, which uses different heads to learn the state transition under different behaviors during unsupervised pre-training and selects the most appropriate head for prediction in the downstream task. Experimental results in the manipulation and locomotion domains demonstrate that EUCLID achieves state-of-the-art performance with high sample efficiency, basically solving the state-based URLB benchmark and reaching a mean normalized score of 104.0±1.2% in downstream tasks with 100k fine-tuning steps, which is equivalent to DDPG's performance at 2M interactive steps with 20x more data.

READ FULL TEXT

page 2

page 7

page 9

page 14

research
09/14/2021

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks

Loading models pre-trained on the large-scale corpus in the general doma...
research
03/25/2022

Reinforcement Learning with Action-Free Pre-Training from Videos

Recent unsupervised pre-training methods have shown to be effective on l...
research
08/10/2023

RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation

Robust estimation is a crucial and still challenging task, which involve...
research
06/17/2022

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Transformer has achieved great successes in learning vision and language...
research
12/26/2022

Toward Efficient Automated Feature Engineering

Automated Feature Engineering (AFE) refers to automatically generate and...
research
10/13/2022

A Mixture of Surprises for Unsupervised Reinforcement Learning

Unsupervised reinforcement learning aims at learning a generalist policy...
research
03/08/2021

Behavior From the Void: Unsupervised Active Pre-Training

We introduce a new unsupervised pre-training method for reinforcement le...

Please sign up or login with your details

Forgot password? Click here to reset