A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

03/15/2019
by   Wesley Suttle, et al.
0

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2017

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition ...
research
07/06/2019

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper considers a distributed reinforcement learning problem in whi...
research
11/22/2021

Off-Policy Correction For Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) provides a framework for probl...
research
03/21/2019

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method t...
research
10/01/2021

Divergence-Regularized Multi-Agent Actor-Critic

Entropy regularization is a popular method in reinforcement learning (RL...
research
09/03/2021

Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

Both single-agent and multi-agent actor-critic algorithms are an importa...
research
02/18/2022

DARL1N: Distributed multi-Agent Reinforcement Learning with One-hop Neighbors

Most existing multi-agent reinforcement learning (MARL) methods are limi...

Please sign up or login with your details

Forgot password? Click here to reset