Reward-Respecting Subtasks for Model-Based Reinforcement Learning

02/07/2022
by   Richard S. Sutton, et al.
0

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress in state abstraction, but, although the theory of time abstraction has been extensively developed based on the options framework, in practice options have rarely been used in planning. One reason for this is that the space of possible options is immense and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other than the reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. The subtasks proposed in most previous work ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops. We show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.

READ FULL TEXT

page 7

page 8

page 12

research
10/26/2021

Average-Reward Learning and Planning with Options

We extend the options framework for temporal abstraction in reinforcemen...
research
09/30/2022

Multi-Task Option Learning and Discovery for Stochastic Path Planning

This paper addresses the problem of reliably and efficiently solving bro...
research
10/30/2017

Eigenoption Discovery through the Deep Successor Representation

Options in reinforcement learning allow agents to hierarchically decompo...
research
08/06/2021

Temporally Abstract Partial Models

Humans and animals have the ability to reason and make predictions about...
research
12/01/2022

ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process

Learning rich skills through temporal abstractions without supervision o...
research
03/12/2020

Option Discovery in the Absence of Rewards with Manifold Analysis

Options have been shown to be an effective tool in reinforcement learnin...
research
09/05/2022

MO2: Model-Based Offline Options

The ability to discover useful behaviours from past experience and trans...

Please sign up or login with your details

Forgot password? Click here to reset