Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

12/21/2021
by   Jiafei Lyu, et al.
0

It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones. However, existing actor-critic methods suffer more or less from underestimation bias or overestimation bias, which negatively affect their performance. In this paper, we reveal a simple but effective principle: proper value correction benefits bias alleviation, where we propose the generalized-activated weighting operator that uses any non-decreasing function, namely activation function, as weights for better value estimation. Particularly, we integrate the generalized-activated weighting operator into value estimation and introduce a novel algorithm, Generalized-activated Deep Double Deterministic Policy Gradients (GD3). We theoretically show that GD3 is capable of alleviating the potential estimation bias. We interestingly find that simple activation functions lead to satisfying performance with no additional tricks, and could contribute to faster convergence. Experimental results on numerous challenging continuous control tasks show that GD3 with task-specific activation outperforms the common baseline methods. We also uncover a fact that fine-tuning the polynomial activation function achieves superior results on most of the tasks.

READ FULL TEXT
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
06/06/2021

Efficient Continuous Control with Double Actors and Regularized Critics

How to obtain good value estimation is one of the key problems in Reinfo...
research
09/24/2021

Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

Approximation of the value functions in value-based deep reinforcement l...
research
09/22/2021

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

In value-based deep reinforcement learning methods, approximation of val...
research
09/08/2021

ADER:Adapting between Exploration and Robustness for Actor-Critic Methods

Combining off-policy reinforcement learning methods with function approx...
research
09/29/2021

On the Estimation Bias in Double Q-Learning

Double Q-learning is a classical method for reducing overestimation bias...

Please sign up or login with your details

Forgot password? Click here to reset