Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

03/01/2020
by   Xingyu Sha, et al.
0

This paper proposes a fully asynchronous scheme for policy evaluation of distributed reinforcement learning (DisRL) over peer-to-peer networks. Without any form of coordination, nodes can communicate with neighbors and compute their local variables using (possibly) delayed information at any time, which is in sharp contrast to the asynchronous gossip. Thus, the proposed scheme fully takes advantage of the distributed setting. We prove that our method converges at a linear rate O(c^k) where c∈(0,1) and k increases by one no matter on which node updates, showing the computational advantage by reducing the amount of synchronization. Numerical experiments show that our method speeds up linearly w.r.t. the number of nodes, and is robust to straggler nodes. To the best of our knowledge, our work is the first theoretical analysis for asynchronous update in DisRL, including the parallel RL domain advocated by A3C.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2019

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Multi-simulator training has contributed to the recent success of Deep R...
research
08/20/2021

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

This work proposes a distributed algorithm for solving empirical risk mi...
research
07/26/2021

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Recently introduced distributed zeroth-order optimization (ZOO) algorith...
research
01/15/2021

CPU Scheduling in Data Centers Using Asynchronous Finite-Time Distributed Coordination Mechanisms

We propose an asynchronous iterative scheme which allows a set of interc...
research
07/04/2012

Asynchronous Dynamic Bayesian Networks

Systems such as sensor networks and teams of autonomous robots consist o...
research
09/09/2019

A Filtering Approach for Resiliency of Distributed Observers against Smart Spoofers

For a Linear Time-Invariant (LTI) system, a network of observers is cons...
research
07/15/2022

Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization

In decentralized optimization environments, each agent i in a network of...

Please sign up or login with your details

Forgot password? Click here to reset