Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

06/16/2019
by   Bin Hu, et al.
0

In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the control community to characterize the exact behaviors of the first and second order moments of a large family of temporal difference learning algorithms. For both the IID and Markov noise cases, we show that the evolution of some augmented versions of the mean and covariance matrix of TD learning exactly follows the trajectory of a deterministic linear time-invariant (LTI) dynamical system. Applying the well-known LTI system theory, we obtain closed-form expressions for the mean and covariance matrix of TD learning at any time step. We provide a tight matrix spectral radius condition to guarantee the convergence of the covariance matrix of TD learning, and perform a perturbation analysis to characterize the dependence of the TD behaviors on learning rate. For the IID case, we provide an exact formula characterizing how the mean and covariance matrix of TD learning converge to the steady state values at a linear rate. For the Markov case, we use our formulas to explain how the behaviors of TD learning algorithms are affected by learning rate and various properties of the underlying Markov chain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

In this paper, we consider the policy evaluation problem in multi-agent ...
research
06/15/2023

A CLT for the difference of eigenvalue statistics of sample covariance matrices

In the case where the dimension of the data grows at the same rate as th...
research
05/31/2016

Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks

Adaptive learning rate algorithms such as RMSProp are widely used for tr...
research
03/24/2022

Learning the Dynamics of Autonomous Linear Systems From Multiple Trajectories

We consider the problem of learning the dynamics of autonomous linear sy...
research
10/18/2017

Towards a unified theory for testing statistical hypothesis: Multinormal mean with nuisance covariance matrix

Under a multinormal distribution with arbitrary unknown covariance matri...
research
12/09/2022

Predictor networks and stop-grads provide implicit variance regularization in BYOL/SimSiam

Self-supervised learning (SSL) learns useful representations from unlabe...

Please sign up or login with your details

Forgot password? Click here to reset