Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage. 2) We optimize policies by maximizing their number under the constraint that each of them reaches distinct regions of the environment (i.e., they are sufficiently discriminable) and prove that this serves as a lower bound to the original mutual information objective. 3) Finally, we compose the learned directed skills into a growing tree that adaptively covers the environment. We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.

READ FULL TEXT

page 9

page 22

page 23

page 25

research
05/08/2023

Behavior Contrastive Learning for Unsupervised Skill Discovery

In reinforcement learning, unsupervised skill discovery aims to learn di...
research
02/01/2022

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsup...
research
10/28/2021

Wasserstein Distance Maximizing Intrinsic Control

This paper deals with the problem of learning a skill-conditioned policy...
research
10/15/2021

Wasserstein Unsupervised Reinforcement Learning

Unsupervised reinforcement learning aims to train agents to learn a hand...
research
02/10/2020

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

Acquiring abilities in the absence of a task-oriented reward function is...
research
02/16/2022

Open-Ended Reinforcement Learning with Neural Reward Functions

Inspired by the great success of unsupervised learning in Computer Visio...
research
02/24/2018

One Big Net For Everything

I apply recent work on "learning to think" (2015) and on PowerPlay (2011...

Please sign up or login with your details

Forgot password? Click here to reset