Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning

04/19/2021
by   Jie Ren, et al.
0

Deep reinforcement learning (DRL) has successfully solved various problems recently, typically with a unimodal policy representation. However, grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance, which may lead to a multimodal policy represented as a mixture-of-experts (MOE). To our best knowledge, present DRL algorithms for general utility do not deploy this method as policy function approximators due to the potential challenge in its differentiability for policy learning. In this work, we propose a probabilistic mixture-of-experts (PMOE) implemented with a Gaussian mixture model (GMM) for multimodal policy, together with a novel gradient estimator for the indifferentiability problem, which can be applied in generic off-policy and on-policy DRL algorithms using stochastic policies, e.g., Soft Actor-Critic (SAC) and Proximal Policy Optimisation (PPO). Experimental results testify the advantage of our method over unimodal polices and two different MOE methods, as well as a method of option frameworks, based on the above two types of DRL algorithms, on six MuJoCo tasks. Different gradient estimations for GMM like the reparameterisation trick (Gumbel-Softmax) and the score-ratio trick are also compared with our method. We further empirically demonstrate the distinguishable primitives learned with PMOE and show the benefits of our method in terms of exploration.

READ FULL TEXT
research
01/15/2019

Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning

Sepsis is the leading cause of mortality in the ICU. It is challenging t...
research
05/14/2023

PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming

Providing a high Quality of Experience (QoE) for video streaming in 5G a...
research
06/20/2023

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization

Inventory management offers unique opportunities for reliably evaluating...
research
07/27/2022

SAC-AP: Soft Actor Critic based Deep Reinforcement Learning for Alert Prioritization

Intrusion detection systems (IDS) generate a large number of false alert...
research
10/14/2022

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Deep reinforcement learning (DRL) is one of the most powerful tools for ...
research
12/10/2020

An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Deep reinforcement learning (DRL) algorithms and evolution strategies (E...
research
03/24/2023

Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition

Deep reinforcement learning (DRL) frameworks are increasingly used to so...

Please sign up or login with your details

Forgot password? Click here to reset