Variational Option Discovery Algorithms

07/26/2018
by   Joshua Achiam, et al.
0

We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories. Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent's performance is strong enough (as measured by the decoder) on the current set of contexts. We show that this simple trick stabilizes training for VALOR and prior variational option discovery methods, allowing a single agent to learn many more modes of behavior than it could with a fixed context distribution. Finally, we investigate other topics related to variational option discovery, including fundamental limitations of the general approach and the applicability of learned options to downstream tasks.

READ FULL TEXT

page 7

page 18

page 19

page 20

page 21

page 29

research
11/01/2019

PODNet: A Neural Network for Discovery of Plannable Options

Learning from demonstration has been widely studied in machine learning ...
research
11/22/2016

Variational Intrinsic Control

In this paper we introduce a new unsupervised reinforcement learning met...
research
10/07/2022

Multi-agent Deep Covering Option Discovery

The use of options can greatly accelerate exploration in reinforcement l...
research
09/09/2019

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Option discovery and skill acquisition frameworks are integral to the fu...
research
01/03/2022

GeoGebra Discovery in Context

In our contribution we will reflect, through a collection of selected ex...
research
12/01/2022

ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process

Learning rich skills through temporal abstractions without supervision o...
research
06/03/2022

Option Discovery for Autonomous Generation of Symbolic Knowledge

In this work we present an empirical study where we demonstrate the poss...

Please sign up or login with your details

Forgot password? Click here to reset