Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation

07/05/2021
by   Yao Yao, et al.
0

Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate dynamic models, evaluating the uncertainty of the learned models, and more rational utilization of models. To mitigate these issues, we present MEEE, a model-ensemble method that consists of optimistic exploration and weighted exploitation. During exploration, unlike prior methods directly selecting the optimal action that maximizes the expected accumulative return, our agent first generates a set of action candidates and then seeks out the optimal action that takes both expected return and future observation novelty into account. During exploitation, different discounted weights are assigned to imagined transition tuples according to their model uncertainty respectively, which will prevent model predictive error propagation in agent training. Experiments on several challenging continuous control benchmark tasks demonstrated that our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.

READ FULL TEXT

page 1

page 6

research
07/09/2020

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Model-free deep reinforcement learning (RL) has been successful in a ran...
research
11/18/2018

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...
research
02/28/2018

Model-Ensemble Trust-Region Policy Optimization

Model-free reinforcement learning (RL) methods are succeeding in a growi...
research
04/26/2023

FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems

Model-based reinforcement learning is a powerful tool, but collecting da...
research
03/20/2023

Deceptive Reinforcement Learning in Model-Free Domains

This paper investigates deceptive reinforcement learning for privacy pre...
research
01/31/2012

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration

We present an implementation of model-based online reinforcement learnin...
research
01/15/2020

SEERL: Sample Efficient Ensemble Reinforcement Learning

Ensemble learning is a very prevalent method employed in machine learnin...

Please sign up or login with your details

Forgot password? Click here to reset