PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

10/13/2021
by   Ziwei Guan, et al.
0

Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations. In this work, we propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity of ETD to be polynomials. Our experiments validate the superior performance of PER-ETD and its advantage over ETD.

READ FULL TEXT
research
03/02/2021

Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning

We study the dynamics of temporal-difference learning with neural networ...
research
11/26/2015

Incremental Truncated LSTD

Balancing between computational efficiency and sample efficiency is an i...
research
06/11/2021

Preferential Temporal Difference Learning

Temporal-Difference (TD) learning is a general and very useful tool for ...
research
06/04/2022

Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement Learning

Q(σ) is a recently proposed temporal-difference learning method that int...
research
02/09/2018

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called Q(σ), uni...
research
07/06/2021

A Unified Off-Policy Evaluation Approach for General Value Function

General Value Function (GVF) is a powerful tool to represent both the pr...
research
07/09/2018

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Temporal-Difference learning (TD) [Sutton, 1988] with function approxima...

Please sign up or login with your details

Forgot password? Click here to reset