Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

02/06/2019
by   Shaofeng Zou, et al.
0

Though the convergence of major reinforcement learning algorithms has been extensively studied, the finite-sample analysis to further characterize the convergence rate in terms of the sample complexity for problems with continuous state space is still very limited. Such a type of analysis is especially challenging for algorithms with dynamically changing learning policies and under non-i.i.d. sampled data. In this paper, we present the first finite-sample analysis for the SARSA algorithm and its minimax variant (for zero-sum Markov games), with a single sample path and linear function approximation. To establish our results, we develop a novel technique to bound the gradient bias for dynamically changing learning policies, which can be of independent interest. We further provide finite-sample bounds for Q-learning and its minimax variant. Comparison of our result with the existing finite-sample bound indicates that linear function approximation achieves order-level lower sample complexity than the nearest neighbor approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

In this paper, we develop a novel variant of off-policy natural actor-cr...
research
03/03/2023

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

We study two-player zero-sum stochastic games, and propose a form of ind...
research
10/31/2019

Rate of convergence for geometric inference based on the empirical Christoffel function

We consider the problem of estimating the support of a measure from a fi...
research
02/05/2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

We offer a theoretical characterization of off-policy evaluation (OPE) i...
research
04/07/2021

Finite-Sample Analysis for Two Time-scale Non-linear TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two tim...
research
02/12/2018

Q-learning with Nearest Neighbors

We consider the problem of model-free reinforcement learning for infinit...
research
03/01/2023

Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

Nash Q-learning may be considered one of the first and most known algori...

Please sign up or login with your details

Forgot password? Click here to reset