Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

09/24/2021
by   Baturay Saglam, et al.
0

Approximation of the value functions in value-based deep reinforcement learning systems induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We introduce a parameter-free, novel deep Q-learning variant to reduce this underestimation bias for continuous control. By obtaining fixed weights in computing the critic objective as a linear combination of the approximate critic functions, our Q-value update rule integrates the concepts of Clipped Double Q-learning and Maxmin Q-learning. We test the performance of our improvement on a set of MuJoCo and Box2D continuous control tasks and find that it improves the state-of-the-art and outperforms the baseline algorithms in the majority of the environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2021

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

In value-based deep reinforcement learning methods, approximation of val...
research
10/19/2020

Softmax Deep Double Deterministic Policy Gradients

A widely-used actor-critic reinforcement learning algorithm for continuo...
research
02/07/2021

Deep Reinforcement Learning with Dynamic Optimism

In recent years, deep off-policy actor-critic algorithms have become a d...
research
12/21/2021

Value Activation for Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients

It is vital to accurately estimate the value function in Deep Reinforcem...
research
10/07/2021

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Popular off-policy deep reinforcement learning algorithms compensate for...
research
11/06/2018

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

In this paper, we propose an actor ensemble algorithm, named ACE, for co...
research
11/24/2021

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Accurate value estimates are important for off-policy reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset