MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

09/22/2021
by   Qiang He, et al.
0

Ensemble reinforcement learning (RL) aims to mitigate instability in Q-learning and to learn a robust policy, which introduces multiple value and policy functions. In this paper, we consider finding a novel but simple ensemble Deep RL algorithm to solve the resource consumption issue. Specifically, we consider integrating multiple models into a single model. To this end, we propose the Minimalist Ensemble Policy Gradient framework (MEPG), which introduces minimalist ensemble consistent Bellman update. And we find one value network is sufficient in our framework. Moreover, we theoretically show that the policy evaluation phase in the MEPG is mathematically equivalent to a deep Gaussian Process. To verify the effectiveness of the MEPG framework, we conduct experiments on the gym simulator, which show that the MEPG framework matches or outperforms the state-of-the-art ensemble methods and model-free methods without additional computational resource costs.

READ FULL TEXT
research
06/01/2017

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previous...
research
11/27/2017

Divide-and-Conquer Reinforcement Learning

Standard model-free deep reinforcement learning (RL) algorithms sample a...
research
03/02/2023

Multi-Start Team Orienteering Problem for UAS Mission Re-Planning with Data-Efficient Deep Reinforcement Learning

In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP...
research
06/24/2020

DISK: Learning local features with policy gradient

Local feature frameworks are difficult to learn in an end-to-end fashion...
research
03/07/2018

Accelerated Methods for Deep Reinforcement Learning

Deep reinforcement learning (RL) has achieved many recent successes, yet...
research
04/24/2019

Towards Combining On-Off-Policy Methods for Real-World Applications

In this paper, we point out a fundamental property of the objective in r...
research
02/07/2020

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critic...

Please sign up or login with your details

Forgot password? Click here to reset