Factors of Influence of the Overestimation Bias of Q-Learning

10/11/2022
by   Julius Wagenbach, et al.
7

We study whether the learning rate α, the discount factor γ and the reward signal r have an influence on the overestimation bias of the Q-Learning algorithm. Our preliminary results in environments which are stochastic and that require the use of neural networks as function approximators, show that all three parameters influence overestimation significantly. By carefully tuning α and γ, and by using an exponential moving average of r in Q-Learning's temporal difference target, we show that the algorithm can learn value estimates that are more accurate than the ones of several other popular model-free methods that have addressed its overestimation bias in the past.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2020

Learning and Planning in Average-Reward Markov Decision Processes

We introduce improved learning and planning algorithms for average-rewar...
research
07/10/2023

Dynamics of Temporal Difference Reinforcement Learning

Reinforcement learning has been successful across several applications i...
research
06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...
research
05/22/2017

Training Deep Networks without Learning Rates Through Coin Betting

Deep learning methods achieve state-of-the-art performance in many appli...
research
01/20/2018

Bayesian Distributed Lag Models

Distributed lag models (DLMs) express the cumulative and delayed depende...
research
06/24/2019

In Hindsight: A Smooth Reward for Steady Exploration

In classical Q-learning, the objective is to maximize the sum of discoun...
research
09/12/2022

If Influence Functions are the Answer, Then What is the Question?

Influence functions efficiently estimate the effect of removing a single...

Please sign up or login with your details

Forgot password? Click here to reset