Learning with Options that Terminate Off-Policy

11/10/2017
by   Anna Harutyunyan, et al.
0

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(β), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(β) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2019

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...
research
12/01/2018

Discovering hierarchies using Imitation Learning from hierarchy aware policies

Learning options that allow agents to exhibit temporally higher order be...
research
07/30/2020

Data-efficient Hindsight Off-policy Option Learning

Solutions to most complex tasks can be decomposed into simpler, intermed...
research
10/21/2019

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

In a multi-agent system, an agent's optimal policy will typically depend...
research
10/06/2020

Diverse Exploration via InfoMax Options

In this paper, we study the problem of autonomously discovering temporal...
research
10/03/2022

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

Deep Reinforcement Learning (RL) is unquestionably a robust framework to...
research
11/22/2016

Variational Intrinsic Control

In this paper we introduce a new unsupervised reinforcement learning met...

Please sign up or login with your details

Forgot password? Click here to reset