Distributed TD(0) with Almost No Communication

04/16/2021
by   Rui Liu, et al.
0

We provide a new non-asymptotic analysis of distributed TD(0) with linear function approximation. Our approach relies on "one-shot averaging," where N agents run local copies of TD(0) and average the outcomes only once at the very end. We consider two models: one in which the agents interact with an environment they can observe and whose transitions depends on all of their actions (which we call the global state model), and one in which each agent can run a local copy of an identical Markov Decision Process, which we call the local state model. In the global state model, we show that the convergence rate of our distributed one-shot averaging method matches the known convergence rate of TD(0). By contrast, the best convergence rate in the previous literature showed a rate which, in the worst case, underperformed the non-distributed version by O(N^3) in terms of the number of agents N. In the local state model, we demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of N faster than the convergence time of TD(0). As far as we are aware, this is the first result rigorously showing benefits from parallelism for temporal difference methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

On the Linear Convergence of Policy Gradient under Hadamard Parameterization

The convergence of deterministic policy gradient under the Hadamard para...
research
10/08/2012

A Fast Distributed Proximal-Gradient Method

We present a distributed proximal-gradient method for optimizing the ave...
research
06/11/2020

Distributed Reinforcement Learning in Multi-Agent Networked Systems

We study distributed reinforcement learning (RL) for a network of agents...
research
09/26/2019

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Gradient-based temporal difference (GTD) algorithms are widely used in o...
research
08/07/2019

Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual optimization

We consider a distributed multi-agent policy evaluation problem in reinf...
research
10/14/2022

A Multistep Frank-Wolfe Method

The Frank-Wolfe algorithm has regained much interest in its use in struc...
research
09/03/2018

Improving Convergence Rate Of IC3

IC3, a well-known model checker, proves a property of a state system ξ b...

Please sign up or login with your details

Forgot password? Click here to reset