An Adiabatic Theorem for Policy Tracking with TD-learning

10/24/2020
by   Neil Walton, et al.
0

We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time. Our results apply a new adiabatic theorem that bounds the mixing time of time-inhomogeneous Markov chains. We derive finite-time bounds for tabular temporal difference learning and Q-learning when the policy used for training changes in time. To achieve this, we develop bounds for stochastic approximation under asynchronous adiabatic updates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

We present for the first time an asymptotic convergence analysis of two ...
research
09/10/2019

A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

Motivated by the widespread use of temporal-difference (TD-) and Q-learn...
research
06/06/2018

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used t...
research
03/01/2019

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed a...
research
12/28/2016

Efficient iterative policy optimization

We tackle the issue of finding a good policy when the number of policy u...
research
02/28/2016

Investigating practical linear temporal difference learning

Off-policy reinforcement learning has many applications including: learn...
research
07/09/2018

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Temporal-Difference learning (TD) [Sutton, 1988] with function approxima...

Please sign up or login with your details

Forgot password? Click here to reset