DeepAI AI Chat
Log In Sign Up

URLB: Unsupervised Reinforcement Learning Benchmark

10/28/2021
by   Michael (Misha) Laskin, et al.
NYU college
berkeley college
15

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.

READ FULL TEXT

page 19

page 20

09/24/2022

Unsupervised Model-based Pre-training for Data-efficient Control from Pixels

Controlling artificial agents from visual sensory data is an arduous tas...
05/18/2019

Evolving Rewards to Automate Reinforcement Learning

Many continuous control tasks have easily formulated objectives, yet usi...
03/08/2021

Behavior From the Void: Unsupervised Active Pre-Training

We introduce a new unsupervised pre-training method for reinforcement le...
10/23/2022

Learning General World Models in a Handful of Reward-Free Deployments

Building generally capable agents is a grand challenge for deep reinforc...
08/25/2022

Light-weight probing of unsupervised representations for Reinforcement Learning

Unsupervised visual representation learning offers the opportunity to le...
11/19/2020

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Reinforcement learning provides a general framework for flexible decisio...
01/18/2023

Human-Timescale Adaptation in an Open-Ended Task Space

Foundation models have shown impressive adaptation and scalability in su...