
FiniteSample Analysis for Two Timescale Nonlinear TDC with General Smooth Function Approximation
Temporaldifference learning with gradient correction (TDC) is a two tim...
read it

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
In this paper, we introduce proximal gradient temporal difference learni...
read it

Finite Sample Analyses for TD(0) with Function Approximation
TD(0) is one of the most commonly used algorithms in reinforcement learn...
read it

Momentum Qlearning with FiniteSample Convergence Guarantee
Existing studies indicate that momentum ideas in conventional optimizati...
read it

Policy Certificates: Towards Accountable Reinforcement Learning
The performance of a reinforcement learning algorithm can vary drastical...
read it

GreedyGQ with Variance Reduction: Finitetime Analysis and Improved Complexity
GreedyGQ is a valuebased reinforcement learning (RL) algorithm for opt...
read it

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and FirstOrder Efficiency
We offer a theoretical characterization of offpolicy evaluation (OPE) i...
read it
Finitesample Analysis of GreedyGQ with Linear Function Approximation under Markovian Noise
GreedyGQ is an offpolicy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finitesample analysis for the GreedyGQ algorithm with linear function approximation under Markovian noise. Our finitesample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a tradeoff between the convergence rate and the quality of the obtained policy. Our paper extends the finitesample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finitesample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the GreedyGQ algorithm is nonconvex. Moreover, the GreedyGQ algorithm is also not a linear twotimescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finitesample analysis of nonconvex valuebased reinforcement learning algorithms for optimal control.
READ FULL TEXT
Comments
There are no comments yet.