Diverse Exploration via InfoMax Options

10/06/2020
by   Yuji Kanagawa, et al.
1

In this paper, we study the problem of autonomously discovering temporally abstracted actions, or options, for exploration in reinforcement learning. For learning diverse options suitable for exploration, we introduce the infomax termination objective defined as the mutual information between options and their corresponding state transitions. We derive a scalable optimization scheme for maximizing this objective via the termination condition of options, yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative experiments, we empirically show that IMOC learns diverse options and utilizes them for exploration. Moreover, we show that IMOC scales well to continuous control tasks.

READ FULL TEXT

page 7

page 8

research
02/26/2019

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...
research
11/22/2016

Variational Intrinsic Control

In this paper we introduce a new unsupervised reinforcement learning met...
research
12/01/2018

Discovering hierarchies using Imitation Learning from hierarchy aware policies

Learning options that allow agents to exhibit temporally higher order be...
research
01/26/2023

Deep Laplacian-based Options for Temporally-Extended Exploration

Selecting exploratory actions that generate a rich stream of experience ...
research
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
research
11/17/2017

Learning User Preferences to Incentivize Exploration in the Sharing Economy

We study platforms in the sharing economy and discuss the need for incen...
research
11/10/2017

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...

Please sign up or login with your details

Forgot password? Click here to reset