Reanalysis of Variance Reduced Temporal Difference Learning

01/07/2020
by   Tengyu Xu, et al.
0

Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was proposed by Korda and La (2015), which applies the variance reduction technique directly to the online TD learning with Markovian samples. In this work, we first point out the technical errors in the analysis of VRTD in Korda and La (2015), and then provide a mathematically solid analysis of the non-asymptotic convergence of VRTD and its variance reduction performance. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate. Furthermore, the variance error (for both i.i.d. and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD. As a result, the overall computational complexity of VRTD to attain a given accurate solution outperforms that of TD under Markov sampling and outperforms that of TD under i.i.d. sampling for a sufficiently small conditional number.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2020

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Variance reduction techniques have been successfully applied to temporal...
research
11/29/2022

Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Temporal difference (TD) learning is a simple algorithm for policy evalu...
research
09/17/2015

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision process...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
research
10/08/2019

Variance reduction for Markov chains with application to MCMC

In this paper we propose a novel variance reduction approach for additiv...
research
10/30/2022

Robust Data Valuation via Variance Reduced Data Shapley

Data valuation, especially quantifying data value in algorithmic predict...

Please sign up or login with your details

Forgot password? Click here to reset