Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

07/04/2018
by   Jacob Buckman, et al.
6

Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity. However, this is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will almost always be imperfect. As a result, a key challenge is to combine model-based approaches with model-free learning in such a way that errors in the model do not degrade performance. We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths for each individual example, STEVE ensures that the model is only utilized when doing so does not introduce significant errors. Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency, and in contrast to previous model-based approaches, performance does not degrade in complex environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Efficient and Robust Reinforcement Learning with Uncertainty-based Value Expansion

By integrating dynamics models into model-free reinforcement learning (R...
research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
02/28/2018

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Recent model-free reinforcement learning algorithms have proposed incorp...
research
08/08/2017

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Model-free deep reinforcement learning algorithms have been shown to be ...
research
12/24/2019

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Despite its potential to improve sample complexity versus model-free app...
research
06/12/2018

Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration

Q-Ensembles are a model-free approach where input images are fed into di...
research
04/05/2021

Probabilistic Programming Bots in Intuitive Physics Game Play

Recent findings suggest that humans deploy cognitive mechanism of physic...

Please sign up or login with your details

Forgot password? Click here to reset