Log In Sign Up

Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation

by   Yao Yao, et al.

Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate dynamic models, evaluating the uncertainty of the learned models, and more rational utilization of models. To mitigate these issues, we present MEEE, a model-ensemble method that consists of optimistic exploration and weighted exploitation. During exploration, unlike prior methods directly selecting the optimal action that maximizes the expected accumulative return, our agent first generates a set of action candidates and then seeks out the optimal action that takes both expected return and future observation novelty into account. During exploitation, different discounted weights are assigned to imagined transition tuples according to their model uncertainty respectively, which will prevent model predictive error propagation in agent training. Experiments on several challenging continuous control benchmark tasks demonstrated that our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.


page 1

page 6


SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Model-free deep reinforcement learning (RL) has been successful in a ran...

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...

Model-Ensemble Trust-Region Policy Optimization

Model-free reinforcement learning (RL) methods are succeeding in a growi...

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

In continuous action domains, standard deep reinforcement learning algor...

High-Accuracy Model-Based Reinforcement Learning, a Survey

Deep reinforcement learning has shown remarkable success in the past few...

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration

We present an implementation of model-based online reinforcement learnin...

SEERL: Sample Efficient Ensemble Reinforcement Learning

Ensemble learning is a very prevalent method employed in machine learnin...