Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

10/05/2019
by   Che Wang, et al.
0

The field of Deep Reinforcement Learning (DRL) has recently seen a surge in the popularity of maximum entropy reinforcement learning algorithms. Their popularity stems from the intuitive interpretation of the maximum entropy objective and their superior sample efficiency on standard benchmarks. In this paper, we seek to understand the primary contribution of the entropy term to the performance of maximum entropy algorithms. For the Mujoco benchmark, we demonstrate that the entropy term in Soft Actor-Critic (SAC) principally addresses the bounded nature of the action spaces. With this insight, we propose a simple normalization scheme which allows a streamlined algorithm without entropy maximization match the performance of SAC. Our experimental results demonstrate a need to revisit the benefits of entropy regularization in DRL. We also propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. We further show that the streamlined algorithm with the simple non-uniform sampling scheme outperforms SAC and achieves state-of-the-art performance on challenging continuous control tasks.

READ FULL TEXT
research
06/10/2019

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement...
research
06/02/2021

Deep Reinforcement Learning-based UAV Navigation and Control: A Soft Actor-Critic with Hindsight Experience Replay Approach

In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight...
research
09/24/2021

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement lear...
research
11/18/2020

Weighted Entropy Modification for Soft Actor-Critic

We generalize the existing principle of the maximum Shannon entropy in r...
research
07/27/2022

SAC-AP: Soft Actor Critic based Deep Reinforcement Learning for Alert Prioritization

Intrusion detection systems (IDS) generate a large number of false alert...
research
06/14/2018

Maximum a Posteriori Policy Optimisation

We introduce a new algorithm for reinforcement learning called Maximum a...
research
04/08/2020

Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

In this paper we investigate some of the issues that arise from the scal...

Please sign up or login with your details

Forgot password? Click here to reset