
VarianceReduced OffPolicy TDC Learning: NonAsymptotic Convergence Analysis
Variance reduction techniques have been successfully applied to temporal...
read it

Finitesample Analysis of GreedyGQ with Linear Function Approximation under Markovian Noise
GreedyGQ is an offpolicy two timescale algorithm for optimal control i...
read it

On the Convergence of Reinforcement Learning
We consider the problem of Reinforcement Learning for nonlinear stochast...
read it

Sample Complexity Bounds for Two Timescale Valuebased Reinforcement Learning Algorithms
Two timescale stochastic approximation (SA) has been widely used in valu...
read it

Local policy search with Bayesian optimization
Reinforcement learning (RL) aims to find an optimal policy by interactio...
read it

LeastSquares Temporal Difference Learning for the Linear Quadratic Regulator
Reinforcement learning (RL) has been successfully used to solve many con...
read it

Reanalysis of Variance Reduced Temporal Difference Learning
Temporal difference (TD) learning is a popular algorithm for policy eval...
read it
GreedyGQ with Variance Reduction: Finitetime Analysis and Improved Complexity
GreedyGQ is a valuebased reinforcement learning (RL) algorithm for optimal control. Recently, the finitetime analysis of GreedyGQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an Ο΅stationary point with a sample complexity in the order of πͺ(Ο΅^3). Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variancereduced GreedyGQ (VRGreedyGQ) algorithm for offpolicy optimal control. In particular, the algorithm applies the SVRGbased variance reduction scheme to reduce the stochastic variance of the two timescale updates. We study the finitetime convergence of VRGreedyGQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original GreedyGQ. In particular, we prove that VRGreedyGQ achieves an improved sample complexity that is in the order of πͺ(Ο΅^2). We further compare the performance of VRGreedyGQ with that of GreedyGQ in various RL experiments to corroborate our theoretical findings.
READ FULL TEXT
Comments
There are no comments yet.