Discovery of Options via Meta-Learned Subgoals

02/12/2021
by   Vivek Veeriah, et al.
5

Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based on a manager-worker decomposition of the RL agent, in which a manager maximises rewards from the environment by learning a task-dependent policy over both a set of task-independent discovered-options and primitive actions. The option-reward and termination functions that define a subgoal for each option are parameterised as neural networks and trained via meta-gradients to maximise their usefulness. Empirical analysis on gridworld and DeepMind Lab tasks show that: (1) our approach can discover meaningful and diverse temporally-extended options in multi-task RL domains, (2) the discovered options are frequently used by the agent while learning to solve the training tasks, and (3) that the discovered options help a randomly initialised manager learn faster in completely new tasks.

READ FULL TEXT

page 6

page 7

page 12

page 13

page 14

research
11/21/2016

Options Discovery with Budgeted Reinforcement Learning

We consider the problem of learning hierarchical policies for Reinforcem...
research
01/06/2020

Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

Reinforcement learning can greatly benefit from the use of options as a ...
research
05/25/2022

Toward Discovering Options that Achieve Faster Planning

We propose a new objective for option discovery that emphasizes the comp...
research
01/16/2021

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can...
research
09/30/2022

Multi-Task Option Learning and Discovery for Stochastic Path Planning

This paper addresses the problem of reliably and efficiently solving bro...
research
02/08/2022

GrASP: Gradient-Based Affordance Selection for Planning

Planning with a learned model is arguably a key component of intelligenc...
research
06/03/2022

Option Discovery for Autonomous Generation of Symbolic Knowledge

In this work we present an empirical study where we demonstrate the poss...

Please sign up or login with your details

Forgot password? Click here to reset