Finite-Time Analysis for Double Q-learning

09/29/2020
by   Huaqing Xiong, et al.
5

Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in <cit.> overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an ϵ-accurate neighborhood of the global optimum by taking Ω̃(( 1/(1-γ)^6ϵ^2)^1/ω +(1/1-γ)^1/1-ω) iterations, where ω∈(0,1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2020

Provably-Efficient Double Q-Learning

In this paper, we establish a theoretical comparison between the asympto...
research
08/09/2021

Modified Double DQN: addressing stability

Inspired by double q learning algorithm, the double DQN algorithm was or...
research
12/10/2019

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning...
research
12/04/2019

A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

In this paper, we introduce a unified framework for analyzing a large fa...
research
02/15/2020

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), th...
research
06/24/2019

In Hindsight: A Smooth Reward for Steady Exploration

In classical Q-learning, the objective is to maximize the sum of discoun...
research
08/02/2023

Direct Gradient Temporal Difference Learning

Off-policy learning enables a reinforcement learning (RL) agent to reaso...

Please sign up or login with your details

Forgot password? Click here to reset