Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient

06/18/2020
by   Qiang He, et al.
0

The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3 takes the minimum value between a pair of critics, which introduces underestimation bias. By unifying these two opposites, we propose a novel Weighted Delayed Deep Deterministic Policy Gradient algorithm, which can reduce the estimation error and further improve the performance by weighting a pair of critics. We compare the learning process of value function between DDPG, TD3, and our proposed algorithm, which verifies that our algorithm could indeed eliminate the estimation error of value function. We evaluate our algorithm in the OpenAI Gym continuous control tasks, outperforming the state-of-the-art algorithms on every environment tested.

READ FULL TEXT
research
11/12/2021

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the ...
research
01/02/2011

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning ...
research
02/20/2023

Improving Deep Policy Gradients with Value Function Search

Deep Policy Gradient (PG) algorithms employ value networks to drive the ...
research
09/29/2020

Cross Learning in Deep Q-Networks

In this work, we propose a novel cross Q-learning algorithm, aim at alle...
research
11/19/2022

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune the...
research
06/10/2019

Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing

Network slicing promises to provision diversified services with distinct...
research
07/01/2020

Regularly Updated Deterministic Policy Gradient Algorithm

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most w...

Please sign up or login with your details

Forgot password? Click here to reset