Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

07/10/2018
by   Huazhe Xu, et al.
0

While model-based reinforcement learning has empirically been shown to significantly reduce the sample complexity that hinders model-free RL, the theoretical understanding of such methods has been rather limited. In this paper, we introduce a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees, and a practical algorithm Optimistic Lower Bounds Optimization (OLBO). In particular, we derive a theoretical guarantee of monotone improvement for model-based RL with our framework. We iteratively build a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and maximize it jointly over the policy and the model. Assuming the optimization in each iteration succeeds, the expected reward is guaranteed to improve. The framework also incorporates an optimism-driven perspective, and reveals the intrinsic measure for the model prediction error. Preliminary simulations demonstrate that our approach outperforms the standard baselines on continuous control benchmark tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Model-based Reinforcement Learning (RL) is a popular learning paradigm d...
research
06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...
research
10/15/2022

When to Update Your Model: Constrained Model-based Reinforcement Learning

Designing and analyzing model-based RL (MBRL) algorithms with guaranteed...
research
06/21/2019

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optim...
research
03/23/2022

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Recent advances in deep learning have enabled optimization of deep react...
research
06/28/2022

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

Real-world sequential decision making requires data-driven algorithms th...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...

Please sign up or login with your details

Forgot password? Click here to reset