Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD)

04/15/2021
by   Caleb Bowyer, et al.
0

Using insight from numerical approximation of ODEs and the problem formulation and solution methodology of TD learning through a Galerkin relaxation, I propose a new class of TD learning algorithms. After applying the improved numerical methods, the parameter being approximated has a guaranteed order of magnitude reduction in the Taylor Series error of the solution to the ODE for the parameter θ(t) that is used in constructing the linearly parameterized value function. Predictor-Corrector Temporal Difference (PCTD) is what I call the translated discrete time Reinforcement Learning(RL) algorithm from the continuous time ODE using the theory of Stochastic Approximation(SA). Both causal and non-causal implementations of the algorithm are provided, and simulation results are listed for an infinite horizon task to compare the original TD(0) algorithm against both versions of PCTD(0).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

We explore fixed-horizon temporal difference (TD) methods, reinforcement...
research
09/10/2022

Gradient Descent Temporal Difference-difference Learning

Off-policy algorithms, in which a behavior policy differs from the targe...
research
07/02/2022

q-Learning in Continuous Time

We study the continuous-time counterpart of Q-learning for reinforcement...
research
06/01/2020

Temporal-Differential Learning in Continuous Environments

In this paper, a new reinforcement learning (RL) method known as the met...
research
10/30/2010

Predictive State Temporal Difference Learning

We propose a new approach to value function approximation which combines...
research
12/28/2018

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a centra...
research
05/28/2019

Conditions on Features for Temporal Difference-Like Methods to Converge

The convergence of many reinforcement learning (RL) algorithms with line...

Please sign up or login with your details

Forgot password? Click here to reset