Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

by   Anusha Nagabandi, et al.

Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at


page 1

page 5

page 6


Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Integrating model-free and model-based approaches in reinforcement learn...

Evaluating model-based planning and planner amortization for continuous control

There is a widespread intuition that model-based control methods should ...

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Model-based reinforcement learning algorithms are typically more sample ...

Perimeter Control Using Deep Reinforcement Learning: A Model-free Approach towards Homogeneous Flow Rate Optimization

Perimeter control maintains high traffic efficiency within protected reg...

Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning

The future of mobility-as-a-Service (Maas)should embrace an integrated s...

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learni...

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

We present a new deep meta reinforcement learner, which we call Deep Epi...

Please sign up or login with your details

Forgot password? Click here to reset