AWD3: Dynamic Reduction of the Estimation Bias

11/12/2021
by   Dogan C. Cicek, et al.
0

Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient

The overestimation phenomenon caused by function approximation is a well...
research
02/17/2020

Adaptive Experience Selection for Policy Gradient

Policy gradient reinforcement learning (RL) algorithms have achieved imp...
research
11/24/2021

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Accurate value estimates are important for off-policy reinforcement lear...
research
07/01/2020

Regularly Updated Deterministic Policy Gradient Algorithm

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most w...
research
10/07/2021

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Popular off-policy deep reinforcement learning algorithms compensate for...
research
03/03/2019

Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control in Computationally Complex Environments

Deep Deterministic Policy Gradient (DDPG) has been proved to be a succes...
research
03/21/2019

Towards Characterizing Divergence in Deep Q-Learning

Deep Q-Learning (DQL), a family of temporal difference algorithms for co...

Please sign up or login with your details

Forgot password? Click here to reset