Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

04/20/2022
by   Xingang Guo, et al.
0

In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2019

Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

In this paper, we provide a unified analysis of temporal difference lear...
research
04/28/2021

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation. However...
research
01/30/2023

On the Statistical Benefits of Temporal Difference Learning

Given a dataset on actions and resulting long-term rewards, a direct est...
research
11/03/2019

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Motivated by the emerging use of multi-agent reinforcement learning (MAR...
research
07/09/2020

Provably-Efficient Double Q-Learning

In this paper, we establish a theoretical comparison between the asympto...
research
06/03/2018

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Despite the success of single-agent reinforcement learning, multi-agent ...
research
06/08/2020

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Temporal-difference and Q-learning play a key role in deep reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset