A Mixture of Surprises for Unsupervised Reinforcement Learning

10/13/2022
by   Andrew Zhao, et al.
0

Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose to provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives the agent to either explore or gain control over its environment. However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low. This assumption may not always hold in real-world scenarios, where the entropy of the environment's dynamics may be unknown. Hence, choosing between the two objectives is a dilemma. We propose a novel yet simple mixture of policies to address this concern, allowing us to optimize an objective that simultaneously maximizes and minimizes the surprise. Concretely, we train one mixture component whose objective is to maximize the surprise and another whose objective is to minimize the surprise. Hence, our method does not make assumptions about the entropy of the environment's dynamics. We call our method a Mixture Of SurpriseS (MOSS) for unsupervised reinforcement learning. Experimental results show that our simple method achieves state-of-the-art performance on the URLB benchmark, outperforming previous pure surprise maximization-based objectives. Our code is available at: https://github.com/LeapLabTHU/MOSS.

READ FULL TEXT

page 2

page 7

research
10/24/2020

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

Model-based reinforcement learning is a framework in which an agent lear...
research
02/14/2022

Sequential Bayesian experimental designs via reinforcement learning

Bayesian experimental design (BED) has been used as a method for conduct...
research
10/02/2022

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

Unsupervised reinforcement learning (URL) poses a promising paradigm to ...
research
10/15/2020

Safe Model-based Reinforcement Learning with Robust Cross-Entropy Method

This paper studies the safe reinforcement learning (RL) problem without ...
research
10/15/2021

Wasserstein Unsupervised Reinforcement Learning

Unsupervised reinforcement learning aims to train agents to learn a hand...
research
07/07/2021

Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

This work presents Keep it Simple (KiS), a new approach to unsupervised ...
research
02/17/2021

Fast Graph Learning with Unique Optimal Solutions

Graph Representation Learning (GRL) has been advancing at an unprecedent...

Please sign up or login with your details

Forgot password? Click here to reset