Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

by   Jincheng Mei, et al.
University of Alberta

Model-based reinforcement learning (MBRL) can significantly improve sample efficiency, particularly when carefully choosing the states from which to sample hypothetical transitions. Such prioritization has been empirically shown to be useful for both experience replay (ER) and Dyna-style planning. However, there is as yet little theoretical understanding in RL about such prioritization strategies, and why they help. In this work, we revisit prioritized ER and, in an ideal setting, show an equivalence to minimizing cubic loss, providing theoretical insight into why it improves upon uniform sampling. This ideal setting, however, cannot be realized in practice, due to insufficient coverage of the sample space and outdated priorities of training samples. This motivates our model-based approach, which does not suffer from these limitations. Our key idea is to actively search for high priority states using gradient ascent. Under certain conditions, we prove that the distribution of hypothetical experiences generated from these states provides a diverse set of states, sampled proportionally to approximately true priorities. Our experiments on both benchmark and application-oriented domain show that our approach achieves superior performance over both the model-free prioritized ER method and several closely related model-based baselines.


Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learni...

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

Model-based strategies for control are critical to obtain sample efficie...

Is Q-learning Provably Efficient?

Model-free reinforcement learning (RL) algorithms, such as Q-learning, d...

Understanding the effect of varying amounts of replay per step

Model-based reinforcement learning uses models to plan, where the predic...

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

While model-based reinforcement learning has empirically been shown to s...

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

We consider the problem of efficiently learning optimal control policies...

MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning

Model-based reinforcement learning is a widely accepted solution for sol...

Please sign up or login with your details

Forgot password? Click here to reset