Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

06/13/2012
by   Richard S. Sutton, et al.
0

We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dynastyle planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Model-Free Robust Reinforcement Learning with Linear Function Approximation

This paper addresses the problem of model-free reinforcement learning fo...
research
06/05/2018

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

Dyna is an architecture for reinforcement learning agents that interleav...
research
02/11/2022

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community....
research
10/06/2020

Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

In this paper, we consider Markov chain and linear quadratic models for ...
research
04/02/2019

Planning with Expectation Models

Distribution and sample models are two popular model choices in model-ba...
research
11/18/2019

Gamma-Nets: Generalizing Value Estimation over Timescale

We present Γ-nets, a method for generalizing value function estimation o...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...

Please sign up or login with your details

Forgot password? Click here to reset