Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

06/10/2019
by   Che Wang, et al.
0

Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves state-of-the-art performance on a range of continuous-action benchmark tasks, outperforming prior on-policy and off-policy methods. The off-policy method employed by SAC samples data uniformly from past experience when performing parameter updates. We propose Emphasizing Recent Experience (ERE), a simple but powerful off-policy sampling technique, which emphasizes recently observed data while not forgetting the past. The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data. We compare vanilla SAC and SAC+ERE, and show that ERE is more sample efficient than vanilla SAC for continuous-action Mujoco tasks. We also consider combining SAC with Priority Experience Replay (PER), a scheme originally proposed for deep Q-learning which prioritizes the data based on temporal-difference (TD) error. We show that SAC+PER can marginally improve the sample efficiency performance of SAC, but much less so than SAC+ERE. Finally, we propose an algorithm which integrates ERE and PER and show that this hybrid algorithm can give the best results for some of the Mujoco tasks.

READ FULL TEXT

page 7

page 15

research
09/24/2021

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement lear...
research
10/05/2019

Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

The field of Deep Reinforcement Learning (DRL) has recently seen a surge...
research
10/20/2020

Survivable Hyper-Redundant Robotic Arm with Bayesian Policy Morphing

In this paper we present a Bayesian reinforcement learning framework tha...
research
07/16/2018

Remember and Forget for Experience Replay

Experience replay (ER) is crucial for attaining high data-efficiency in ...
research
11/18/2020

Weighted Entropy Modification for Soft Actor-Critic

We generalize the existing principle of the maximum Shannon entropy in r...
research
06/05/2023

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Learning high-quality Q-value functions plays a key role in the success ...
research
04/10/2023

Deep Reinforcement Learning with Importance Weighted A3C for QoE enhancement in Video Delivery Services

Adaptive bitrate (ABR) algorithms are used to adapt the video bitrate ba...

Please sign up or login with your details

Forgot password? Click here to reset