Exploring Model-based Planning with Policy Networks

06/20/2019
by   Tingwu Wang, et al.
2

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance. Despite their initial successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state-of-the-art algorithms, such as PETS, TD3 and SAC. To explain the effectiveness of our algorithm, we show that the optimization surface in parameter space is smoother than in action space. Further more, we found the distilled policy network can be effectively applied without the expansive model predictive control during test time for some environments such as Cheetah. Code is released in https://github.com/WilsonWangTHU/POPLIN.

READ FULL TEXT

page 2

page 7

page 14

page 16

page 18

page 19

page 20

research
04/19/2020

Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

Recent works in high-dimensional model-predictive control and model-base...
research
10/19/2020

Dream and Search to Control: Latent Space Planning for Continuous Control

Learning and planning with latent space dynamics has been shown to be us...
research
06/04/2020

Model-Based Generalization Under Parameter Uncertainty Using Path Integral Control

This work addresses the problem of robot interaction in complex environm...
research
12/14/2021

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Current state-of-the-art model-based reinforcement learning algorithms u...
research
04/02/2019

Planning with Expectation Models

Distribution and sample models are two popular model choices in model-ba...
research
11/28/2019

Hierarchical model-based policy optimization: from actions to action sequences and back

We develop a normative framework for hierarchical model-based policy opt...
research
09/17/2019

Visualizing Movement Control Optimization Landscapes

A large body of animation research focuses on optimization of movement c...

Please sign up or login with your details

Forgot password? Click here to reset