Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

05/20/2020
by   Yue Wang, et al.
11

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Finite-Sample Analysis for Two Time-scale Non-linear TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two tim...
research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
04/04/2017

Finite Sample Analyses for TD(0) with Function Approximation

TD(0) is one of the most commonly used algorithms in reinforcement learn...
research
04/08/2023

Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Optimal control is notoriously difficult for stochastic nonlinear system...
research
07/30/2020

Momentum Q-learning with Finite-Sample Convergence Guarantee

Existing studies indicate that momentum ideas in conventional optimizati...
research
03/30/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for opt...
research
11/02/2020

Exact Asymptotics for Linear Quadratic Adaptive Control

Recent progress in reinforcement learning has led to remarkable performa...

Please sign up or login with your details

Forgot password? Click here to reset