A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using L-λ Smoothness

07/29/2023
by   Hengshuai Yao, et al.
0

Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first O(d) (d is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are O(t^-α/2) for some α∈ (0,1). This bound is tight (Dalal et al., 2020), and slower than O(1/√(t)). GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as O(1/t). Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called L-λ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which L-λ smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

We prove almost sure convergence rates of Zeroth-order Gradient Descent ...
research
09/24/2021

The Mirror Langevin Algorithm Converges with Vanishing Bias

The technique of modifying the geometry of a problem from Euclidean to H...
research
10/26/2020

Tight last-iterate convergence rates for no-regret learning in multi-player games

We study the question of obtaining last-iterate convergence rates for no...
research
10/04/2022

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

We consider infinite-horizon discounted Markov decision processes and st...
research
03/05/2020

On the Convergence of Adam and Adagrad

We provide a simple proof of the convergence of the optimization algorit...
research
02/08/2021

Polynomial Linear System Solving with Random Errors: new bounds and early termination technique

This paper deals with the polynomial linear system solving with errors (...
research
11/12/2022

Asynchronous progressive iterative approximation method for least-squares fitting

For large-scale data fitting, the least-squares progressive-iterative ap...

Please sign up or login with your details

Forgot password? Click here to reset