Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

09/22/2021
by   Baturay Saglam, et al.
0

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.

READ FULL TEXT
research
09/24/2021

Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

Approximation of the value functions in value-based deep reinforcement l...
research
02/26/2018

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, f...
research
08/25/2020

t-Soft Update of Target Network for Deep Reinforcement Learning

This paper proposes a new robust update rule of the target network for d...
research
09/29/2020

Cross Learning in Deep Q-Networks

In this work, we propose a novel cross Q-learning algorithm, aim at alle...
research
10/07/2021

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Popular off-policy deep reinforcement learning algorithms compensate for...
research
12/21/2021

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

It is vital to accurately estimate the value function in Deep Reinforcem...
research
09/30/2018

Deep Quality-Value (DQV) Learning

We introduce a novel Deep Reinforcement Learning (DRL) algorithm called ...

Please sign up or login with your details

Forgot password? Click here to reset