Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

09/17/2015
by   Assaf Hallak, et al.
0

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm SuttonMW15, which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this framework , where our introduced parameter β controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for . Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling β, our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2015

Emphatic TD Bellman Operator is a Contraction

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...
research
06/24/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

In temporal difference (TD) learning, off-policy sampling is known to be...
research
01/07/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Temporal difference (TD) learning is a popular algorithm for policy eval...
research
01/21/2022

Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive oper...
research
10/17/2020

A Convenient Generalization of Schlick's Bias and Gain Functions

We present a generalization of Schlick's bias and gain functions – simpl...
research
09/03/2023

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

"Clipping" (a.k.a. importance weight truncation) is a widely used varian...
research
10/12/2022

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

We study the finite-time behaviour of the popular temporal difference (T...

Please sign up or login with your details

Forgot password? Click here to reset