A First Empirical Study of Emphatic Temporal Difference Learning

05/11/2017
by   Sina Ghiassian, et al.
0

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under off-policy training (Sutton, Mahmood and White 2016), but it is also a new algorithm for the on-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of "bounce". In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2019

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed a...
research
09/10/2022

Gradient Descent Temporal Difference-difference Learning

Off-policy algorithms, in which a behavior policy differs from the targe...
research
03/31/2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

We present for the first time an asymptotic convergence analysis of two ...
research
06/06/2020

Stable and Efficient Policy Evaluation

Policy evaluation algorithms are essential to reinforcement learning due...
research
03/02/2020

Risk-Averse Learning by Temporal Difference Methods

We consider reinforcement learning with performance evaluated by a dynam...
research
02/23/2017

Consistent On-Line Off-Policy Evaluation

The problem of on-line off-policy evaluation (OPE) has been actively stu...
research
11/15/2019

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the onli...

Please sign up or login with your details

Forgot password? Click here to reset