Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View

07/25/2022
by   Han-Dong Lim, et al.
0

Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to 𝒪( √(log k/k)) convergence rate that is comparable to or better than most of the state of the art results in the literature. In the mean while, a technique using the similarly transformation is newly applied to avoid the difficulty in the analysis posed by diminishing step-sizes. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis to deepen our understanding on Q-learning via its unique connection to discrete-time switching systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Finite-Time Error Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models

This paper develops a novel framework to analyze the convergence of Q-le...
research
12/04/2019

A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

In this paper, we introduce a unified framework for analyzing a large fa...
research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
06/09/2023

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

The objective of this paper is to investigate the finite-time analysis o...
research
01/23/2020

An O(s^r)-Resolution ODE Framework for Discrete-Time Optimization Algorithms and Applications to Convex-Concave Saddle-Point Problems

There has been a long history of using Ordinary Differential Equations (...
research
12/23/2019

Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation

Motivated by their broad applications in reinforcement learning, we stud...
research
05/04/2020

Accelerated Learning with Robustness to Adversarial Regressors

High order iterative momentum-based parameter update algorithms have see...

Please sign up or login with your details

Forgot password? Click here to reset