The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

06/05/2018
by   G. Zacharias Holland, et al.
0

Dyna is an architecture for reinforcement learning agents that interleaves planning, acting, and learning in an online setting. This architecture aims to make fuller use of limited experience to achieve better performance with fewer environmental interactions. Dyna has been well studied in problems with a tabular representation of states, and has also been extended to some settings with larger state spaces that require function approximation. However, little work has studied Dyna in environments with high-dimensional state spaces like images. In Dyna, the environment model is typically used to generate one-step transitions from selected start states. We applied one-step Dyna to several games from the Arcade Learning Environment and found that the model-based updates offered surprisingly little benefit, even with a perfect model. However, when the model was used to generate longer trajectories of simulated experience, performance improved dramatically. This observation also holds when using a model that is learned from experience; even though the learned model is flawed, it can still be used to accelerate learning.

READ FULL TEXT

page 4

page 6

page 8

page 12

research
06/13/2012

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

We consider the problem of efficiently learning optimal control policies...
research
04/04/2014

Scalable Planning and Learning for Multiagent POMDPs: Extended Version

Online, sample-based planning algorithms for POMDPs have shown great pro...
research
06/12/2018

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

Model-based strategies for control are critical to obtain sample efficie...
research
02/12/2018

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Modern reinforcement learning algorithms reach super-human performance i...
research
01/12/2019

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

We present a novel predictive model architecture based on the principles...
research
09/16/2022

A Biologically-Inspired Dual Stream World Model

The medial temporal lobe (MTL), a brain region containing the hippocampu...
research
05/22/2022

Should Models Be Accurate?

Model-based Reinforcement Learning (MBRL) holds promise for data-efficie...

Please sign up or login with your details

Forgot password? Click here to reset