Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

10/07/2022
by   Adrian Ly, et al.
0

Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the Q-values, commonly referred to as the overestimation bias. To address the overestimation bias and the divergent behaviour, a number of heuristic extensions have been proposed. Notably, multi-step updates have been shown to drastically reduce unstable behaviour while improving agent's training performance. However, agents are often highly sensitive to the selection of the multi-step update horizon (n), and our empirical experiments show that a poorly chosen static value for n can in many cases lead to worse performance than single-step DQN. Inspired by the success of n-step DQN and the effects that multi-step updates have on overestimation bias, this paper proposes a new algorithm that we call `Elastic Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step updates based on the similarity of states visited. Our empirical evaluation shows that ES-DQN out-performs n-step with fixed n updates, Double DQN and Average DQN in several OpenAI Gym environments while at the same time alleviating the overestimation bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2018

Per-decision Multi-step Temporal Difference Learning with Control Variates

Multi-step temporal difference (TD) learning is an important approach in...
research
11/05/2017

Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement...
research
06/17/2023

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Reinforcement Learning has achieved tremendous success in the many Atari...
research
02/25/2021

Bias-reduced multi-step hindsight experience replay

Multi-goal reinforcement learning is widely used in planning and robot m...
research
09/29/2020

Cross Learning in Deep Q-Networks

In this work, we propose a novel cross Q-learning algorithm, aim at alle...
research
10/18/2022

Proximal Learning With Opponent-Learning Awareness

Learning With Opponent-Learning Awareness (LOLA) (Foerster et al. [2018a...
research
06/28/2021

Expert Q-learning: Deep Q-learning With State Values From Expert Examples

We propose a novel algorithm named Expert Q-learning. Expert Q-learning ...

Please sign up or login with your details

Forgot password? Click here to reset