Softmax Deep Double Deterministic Policy Gradients

10/19/2020
by   Ling Pan, et al.
0

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm mitigates the overestimation issue, it can lead to a large underestimation bias. In this paper, we propose to use the Boltzmann softmax operator for value function estimation in continuous control. We first theoretically analyze the softmax operator in continuous action space. Then, we uncover an important property of the softmax operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which sheds new light on the benefits of the operator. We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators, which can effectively improve the overestimation and underestimation bias. We conduct extensive experiments on challenging continuous control tasks, and results show that SD3 outperforms state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control impr...
research
12/21/2021

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

It is vital to accurately estimate the value function in Deep Reinforcem...
research
11/30/2021

Continuous Control With Ensemble Deep Deterministic Policy Gradients

The growth of deep reinforcement learning (RL) has brought multiple exci...
research
09/24/2021

Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

Approximation of the value functions in value-based deep reinforcement l...
research
08/16/2021

Implicitly Regularized RL with Implicit Q-Values

The Q-function is a central quantity in many Reinforcement Learning (RL)...
research
03/14/2019

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Value function estimation is an important task in reinforcement learning...
research
12/16/2016

An Alternative Softmax Operator for Reinforcement Learning

A softmax operator applied to a set of values acts somewhat like the max...

Please sign up or login with your details

Forgot password? Click here to reset