A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

02/02/2021
by   Zaiwei Chen, et al.
0

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous Reinforcement Learning (RL) algorithms. We do this by first reformulating the RL algorithms as Markovian Stochastic Approximation (SA) algorithms to solve fixed-point equations. We then develop a Lyapunov analysis and derive mean-square error bounds on the convergence of the Markovian SA. Based on this central result, we establish finite-sample mean-square convergence bounds for asynchronous RL algorithms such as Q-learning, n-step TD, TD(λ), and off-policy TD algorithms including V-trace. As a by-product, by analyzing the performance bounds of the TD(λ) (and n-step TD) algorithm for general λ (and n), we demonstrate a bias-variance trade-off, i.e., efficiency of bootstrapping in RL. This was first posed as an open problem in [37].

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/03/2020

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

Stochastic Approximation (SA) is a popular approach for solving fixed po...
09/21/2018

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

In reinforcement learning (RL) , one of the key components is policy eva...
08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...
06/24/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

In temporal difference (TD) learning, off-policy sampling is known to be...
02/22/2018

Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

Asynchronous stochastic approximations are an important class of model-f...
03/06/2021

Causal Reinforcement Learning: An Instrumental Variable Approach

In the standard data analysis framework, data is first collected (once f...
12/06/2018

Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning

Despite the increasing interest in multi-agent reinforcement learning (M...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.