A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

11/20/2019
by   Gal Dalal, et al.
0

Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, θ_n and w_n, which are updated using two distinct stepsize sequences, α_n and β_n, respectively. Assuming α_n = n^-α and β_n = n^-β with 1 > α > β > 0, we show that, with high probability, the two iterates converge to their respective solutions θ^* and w^* at rates given by θ_n - θ^* = Õ( n^-α/2) and w_n - w^* = Õ(n^-β/2); here, Õ hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
04/04/2017

Finite Sample Analyses for TD(0) with Function Approximation

TD(0) is one of the most commonly used algorithms in reinforcement learn...
research
08/07/2019

Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual optimization

We consider a distributed multi-agent policy evaluation problem in reinf...
research
12/10/2019

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Q-learning with neural network function approximation (neural Q-learning...
research
02/04/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Linear two-timescale stochastic approximation (SA) scheme is an importan...
research
04/06/2022

High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize

In this paper, we propose a new, simplified high probability analysis of...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...

Please sign up or login with your details

Forgot password? Click here to reset