DeepAI AI Chat
Log In Sign Up

URLB: Unsupervised Reinforcement Learning Benchmark

by   Michael (Misha) Laskin, et al.
NYU college
berkeley college

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.


page 19

page 20


Unsupervised Model-based Pre-training for Data-efficient Control from Pixels

Controlling artificial agents from visual sensory data is an arduous tas...

Evolving Rewards to Automate Reinforcement Learning

Many continuous control tasks have easily formulated objectives, yet usi...

Behavior From the Void: Unsupervised Active Pre-Training

We introduce a new unsupervised pre-training method for reinforcement le...

Learning General World Models in a Handful of Reward-Free Deployments

Building generally capable agents is a grand challenge for deep reinforc...

Light-weight probing of unsupervised representations for Reinforcement Learning

Unsupervised visual representation learning offers the opportunity to le...

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Reinforcement learning provides a general framework for flexible decisio...

Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in su...