Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

06/08/2020
by   Chi Zhang, et al.
5

Model usage is the central challenge of model-based reinforcement learning. Although dynamics model based on deep neural networks provide good generalization for single step prediction, such ability is over exploited when it is used to predict long horizon trajectories due to compounding errors. In this work, we propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR). To eliminate the compounding errors, we only use our model to generate single-step rollouts. Furthermore, we propose to generate diverse model rollouts by non-uniform sampling of the environment states such that the entropy of the model rollouts is maximized. To accomplish this objective, we propose to utilize a prioritized experience replay. We mathematically show that the entropy of the model rollouts is maximally increased when the sampling criteria is the negative likelihood under historical model rollouts distribution. Our preliminary experiments in challenging locomotion benchmarks show that our approach achieves the same sample efficiency of the best model-based algorithms, matches the asymptotic performance of the best model-free algorithms, and significantly reduces the computation requirements of other model-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is dif...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...
research
08/08/2017

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Model-free deep reinforcement learning algorithms have been shown to be ...
research
06/12/2019

When to use parametric models in reinforcement learning?

We examine the question of when and how parametric models are most usefu...
research
10/15/2021

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradient...
research
01/10/2023

Hint assisted reinforcement learning: an application in radio astronomy

Model based reinforcement learning has proven to be more sample efficien...
research
12/19/2016

Self-Correcting Models for Model-Based Reinforcement Learning

When an agent cannot represent a perfectly accurate model of its environ...

Please sign up or login with your details

Forgot password? Click here to reset