Variational Model-based Policy Optimization

06/09/2020
by   Yinlam Chow, et al.
0

Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such algorithms is often challenging because the bias in simulated data may overshadow the ease of data generation. A potential solution to this challenge is to jointly learn and improve model and policy using a universal objective function. In this paper, we leverage the connection between RL and probabilistic inference, and formulate such an objective function as a variational lower-bound of a log-likelihood. This allows us to use expectation maximization (EM) and iteratively fix a baseline policy and learn a variational distribution, consisting of a model and a policy (E-step), followed by improving the baseline policy given the learned variational distribution (M-step). We propose model-based and model-free policy iteration (actor-critic) style algorithms for the E-step and show how the variational distribution learned by them can be used to optimize the M-step in a fully model-based fashion. Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called variational model-based policy optimization (VMBPO), is more sample-efficient and robust to hyper-parameter tuning than its model-free (E-step) counterpart. Using the same control tasks, we also compare VMBPO with several state-of-the-art model-based and model-free RL algorithms and show its sample efficiency and performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2018

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Model-free reinforcement learning (RL) is a powerful, general tool for l...
research
10/12/2020

Local Search for Policy Iteration in Continuous Control

We present an algorithm for local, regularized, policy improvement in re...
research
07/25/2022

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Model-based reinforcement learning (RL) achieves higher sample efficienc...
research
07/10/2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

While model-based reinforcement learning has empirically been shown to s...
research
09/27/2019

The Differentiable Cross-Entropy Method

We study the Cross-Entropy Method (CEM) for the non-convex optimization ...
research
10/27/2021

Dream to Explore: Adaptive Simulations for Autonomous Systems

One's ability to learn a generative model of the world without supervisi...
research
10/19/2020

Model-based Policy Optimization with Unsupervised Model Adaptation

Model-based reinforcement learning methods learn a dynamics model with r...

Please sign up or login with your details

Forgot password? Click here to reset