Bootstrapping the Expressivity with Model-based Planning

10/14/2019
by   Kefan Dong, et al.
16

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2022

Visual Foresight With a Local Dynamics Model

Model-free policy learning has been shown to be capable of learning mani...
research
09/14/2018

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being...
research
05/23/2016

Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

We present an algorithm for model-based reinforcement learning that comb...
research
11/30/2020

Model-based controlled learning of MDP policies with an application to lost-sales inventory control

Recent literature established that neural networks can represent good MD...
research
05/28/2018

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms...
research
03/23/2022

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Recent advances in deep learning have enabled optimization of deep react...
research
12/03/2019

Adaptive Online Planning for Continual Lifelong Learning

We study learning control in an online lifelong learning scenario, where...

Please sign up or login with your details

Forgot password? Click here to reset