n-Step Temporal Difference Learning with Optimal n

03/13/2023
by   Lakshmi Mandal, et al.
0

We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) algorithm. We find the optimal n by resorting to the model-free optimization technique of simultaneous perturbation stochastic approximation (SPSA). We adopt a one-simulation SPSA procedure that is originally for continuous optimization to the discrete optimization framework but incorporates a cyclic perturbation sequence. We prove the convergence of our proposed algorithm, SDPSA, and show that it finds the optimal value of n in n-step TD. Through experiments, we show that the optimal value of n is achieved with SDPSA for any arbitrary initial value of the same.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2018

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called Q(σ), uni...
research
04/17/2017

O^2TD: (Near)-Optimal Off-Policy TD Learning

Temporal difference learning and Residual Gradient methods are the most ...
research
10/27/2020

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a pop...
research
05/27/2019

Temporal-difference learning for nonlinear value function approximation in the lazy training regime

We discuss the approximation of the value function for infinite-horizon ...
research
09/12/2019

Inverse Graphical Method for Global Optimization and Application to Design Centering Problem

Consider the problem of finding an optimal value of some objective funct...
research
08/01/2022

An Adjoint-Free Algorithm for CNOP via Sampling

In this paper, we propose a sampling algorithm based on statistical mach...

Please sign up or login with your details

Forgot password? Click here to reset