Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

06/06/2022
by   Abhinav Bhatia, et al.
2

Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions. When predicting a sequence of interactions, the rollout length, which limits the prediction horizon, is a critical hyperparameter as accuracy of the predictions diminishes in the regions that are further away from real experience. As a result, with a longer rollout length, an overall worse policy is learned in the long run. Thus, the hyperparameter provides a trade-off between quality and efficiency. In this work, we frame the problem of tuning the rollout length as a meta-level sequential decision-making problem that optimizes the final policy learned by model-based reinforcement learning given a fixed budget of environment interactions by adapting the hyperparameter dynamically based on feedback from the learning process, such as accuracy of the model and the remaining budget of interactions. We use model-free deep reinforcement learning to solve the meta-level decision problem and demonstrate that our approach outperforms common heuristic baselines on two well-known reinforcement learning environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2017

Imagination-Augmented Agents for Deep Reinforcement Learning

We introduce Imagination-Augmented Agents (I2As), a novel architecture f...
research
01/09/2018

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

We present a micro-traffic simulation (named "DeepTraffic") where the pe...
research
05/07/2017

Metacontrol for Adaptive Imagination-Based Optimization

Many machine learning systems are built to solve the hardest examples of...
research
10/15/2021

Improving Hyperparameter Optimization by Planning Ahead

Hyperparameter optimization (HPO) is generally treated as a bi-level opt...
research
02/21/2022

Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user Edge-cloud Networks

Deep-learning-based intelligent services have become prevalent in cyber-...
research
06/13/2023

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Policy-based algorithms are among the most widely adopted techniques in ...
research
10/15/2021

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradient...

Please sign up or login with your details

Forgot password? Click here to reset