Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

11/03/2019
by   Jun Sun, et al.
18

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an `additional' projection step to control the `gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type—in the sense that they hold under the most practical assumptions —which is made possible by means of a novel multi-step Lyapunov analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2019

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

We consider the networked multi-agent reinforcement learning (MARL) prob...
research
12/15/2021

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...
research
12/06/2018

Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning

Despite the increasing interest in multi-agent reinforcement learning (M...
research
07/25/2019

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement lear...
research
04/20/2022

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

In this paper, we consider the policy evaluation problem in multi-agent ...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...
research
07/09/2018

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Temporal-Difference learning (TD) [Sutton, 1988] with function approxima...

Please sign up or login with your details

Forgot password? Click here to reset