Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

05/08/2022
by   Qing Li, et al.
1

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks. However, they still suffer two nontrivial obstacles, i.e., low sample efficiency and overestimation bias. To this end, we propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL). Our SDQ-CAL boosts the Double Q-learning for off-policy actor-critic RL based on a modification of the Bellman optimality operator with Advantage Learning. Specifically, SDQ-CAL improves sample efficiency by modifying the reward to facilitate the distinction from experience between the optimal actions and the others. Besides, it mitigates the overestimation issue by updating a pair of critics simultaneously upon double estimators. Extensive experiments reveal that our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks. We release the source code of our method at: <https://github.com/LQNew/SDQ-CAL>.

READ FULL TEXT

page 1

page 6

page 9

research
07/12/2021

Cautious Actor-Critic

The oscillating performance of off-policy learning and persisting errors...
research
07/23/2019

Variance Reduction in Actor Critic Methods (ACM)

After presenting Actor Critic Methods (ACM), we show ACM are control var...
research
06/06/2021

Efficient Continuous Control with Double Actors and Regularized Critics

How to obtain good value estimation is one of the key problems in Reinfo...
research
12/26/2018

Deconfounding Reinforcement Learning in Observational Settings

We propose a general formulation for addressing reinforcement learning (...
research
09/09/2020

DyNODE: Neural Ordinary Differential Equations for Dynamics Modeling in Continuous Control

We present a novel approach (DyNODE) that captures the underlying dynami...
research
10/05/2021

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Randomized ensemble double Q-learning (REDQ) has recently achieved state...
research
09/21/2022

Revisiting Discrete Soft Actor-Critic

We study the adaption of soft actor-critic (SAC) from continuous action ...

Please sign up or login with your details

Forgot password? Click here to reset