The Termination Critic

02/26/2019
by   Anna Harutyunyan, et al.
10

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to -- as is common -- the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option's encoding -- arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination condition. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning and planning.

READ FULL TEXT

page 7

page 13

research
09/16/2016

The Option-Critic Architecture

Temporal abstraction is key to scaling up learning and planning in reinf...
research
10/06/2020

Diverse Exploration via InfoMax Options

In this paper, we study the problem of autonomously discovering temporal...
research
11/10/2017

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...
research
10/03/2022

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

Deep Reinforcement Learning (RL) is unquestionably a robust framework to...
research
12/01/2018

Discovering hierarchies using Imitation Learning from hierarchy aware policies

Learning options that allow agents to exhibit temporally higher order be...
research
01/01/2020

Options of Interest: Temporal Abstraction with Interest Functions

Temporal abstraction refers to the ability of an agent to use behaviours...
research
11/29/2022

Branch-Well-Structured Transition Systems and Extensions

We propose a relaxation to the definition of a well-structured transitio...

Please sign up or login with your details

Forgot password? Click here to reset