Understanding the effect of varying amounts of replay per step

02/20/2023
by   Animesh Kumar Paul, et al.
0

Model-based reinforcement learning uses models to plan, where the predictions and policies of an agent can be improved by using more computation without additional data from the environment, thereby improving sample efficiency. However, learning accurate estimates of the model is hard. Subsequently, the natural question is whether we can get similar benefits as planning with model-free methods. Experience replay is an essential component of many model-free algorithms enabling sample-efficient learning and stability by providing a mechanism to store past experiences for further reuse in the gradient computational process. Prior works have established connections between models and experience replay by planning with the latter. This involves increasing the number of times a mini-batch is sampled and used for updates at each step (amount of replay per step). We attempt to exploit this connection by doing a systematic study on the effect of varying amounts of replay per step in a well-known model-free algorithm: Deep Q-Network (DQN) in the Mountain Car environment. We empirically show that increasing replay improves DQN's sample efficiency, reduces the variation in its performance, and makes it more robust to change in hyperparameters. Altogether, this takes a step toward a better algorithm for deployment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

When to use parametric models in reinforcement learning?

We examine the question of when and how parametric models are most usefu...
research
06/12/2018

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

Model-based strategies for control are critical to obtain sample efficie...
research
07/12/2021

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample e...
research
02/13/2020

XCS Classifier System with Experience Replay

XCS constitutes the most deeply investigated classifier system today. It...
research
08/09/2022

Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2

One approach to meet the challenges of deep lifelong reinforcement learn...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...
research
12/19/2016

Self-Correcting Models for Model-Based Reinforcement Learning

When an agent cannot represent a perfectly accurate model of its environ...

Please sign up or login with your details

Forgot password? Click here to reset