Emphatic TD Bellman Operator is a Contraction

08/14/2015
by   Assaf Hallak, et al.
0

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a √(γ)-contraction modulus (where γ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2015

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision process...
research
02/19/2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

In this paper we study a model-based approach to calculating approximate...
research
06/24/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

In temporal difference (TD) learning, off-policy sampling is known to be...
research
01/19/2022

Critic Algorithms using Cooperative Networks

An algorithm is proposed for policy evaluation in Markov Decision Proces...
research
01/21/2022

Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive oper...
research
03/15/2018

Spectral radii of asymptotic mappings and the convergence speed of the standard fixed point algorithm

Important problems in wireless networks can often be solved by computing...
research
04/04/2018

A Fixed Point Theorem for Iterative Random Contraction Operators over Banach Spaces

Consider a contraction operator T over a Banach space X with a fixed po...

Please sign up or login with your details

Forgot password? Click here to reset