DisTop: Discovering a Topological representation to learn diverse and rewarding skills

06/06/2021
by   Arthur Aubret, et al.
15

The optimal way for a deep reinforcement learning (DRL) agent to explore is to learn a set of skills that achieves a uniform distribution of states. Following this,we introduce DisTop, a new model that simultaneously learns diverse skills and focuses on improving rewarding skills. DisTop progressively builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy. Using this topology, a state-independent hierarchical policy can select where the agent has to keep discovering skills in the state space. In turn, the newly visited states allows an improved learnt representation and the learning loop continues. Our experiments emphasize that DisTop is agnostic to the ground state representation and that the agent can discover the topology of its environment whether the states are high-dimensional binary data, images, or proprioceptive inputs. We demonstrate that this paradigm is competitiveon MuJoCo benchmarks with state-of-the-art algorithms on both single-task dense rewards and diverse skill discovery. By combining these two aspects, we showthat DisTop achieves state-of-the-art performance in comparison with hierarchical reinforcement learning (HRL) when rewards are sparse. We believe DisTop opens new perspectives by showing that bottom-up skill discovery combined with representation learning can unlock the exploration challenge in DRL.

READ FULL TEXT

page 7

page 20

page 21

research
05/08/2023

Behavior Contrastive Learning for Unsupervised Skill Discovery

In reinforcement learning, unsupervised skill discovery aims to learn di...
research
12/14/2020

Relative Variational Intrinsic Control

In the absence of external rewards, agents can still learn useful behavi...
research
03/21/2022

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

In reinforcement learning, the graph Laplacian has proved to be a valuab...
research
07/18/2021

Unsupervised Skill-Discovery and Skill-Learning in Minecraft

Pre-training Reinforcement Learning agents in a task-agnostic manner has...
research
05/21/2023

Unsupervised Discovery of Continuous Skills on a Sphere

Recently, methods for learning diverse skills to generate various behavi...
research
11/10/2018

Diversity-Driven Extensible Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) has recently shown promising a...
research
01/30/2019

InfoBot: Transfer and Exploration via the Information Bottleneck

A central challenge in reinforcement learning is discovering effective p...

Please sign up or login with your details

Forgot password? Click here to reset