Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

11/25/2021
by   Xiaoxiao Zhao, et al.
0

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the non-stationary problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the sample and communication complexity to obtain an ϵ-approximate stationary point. Numerical experiments on the control problem in MARL are performed to validate the effectiveness of the proposed algorithm.

READ FULL TEXT

Authors

page 1

12/06/2021

MDPGT: Momentum-based Decentralized Policy Gradient Tracking

We propose a novel policy gradient method for multi-agent reinforcement ...
06/18/2020

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

In this paper, we propose a distributed zeroth-order policy optimization...
12/07/2018

Communication-Efficient Distributed Reinforcement Learning

This paper studies the distributed reinforcement learning (DRL) problem ...
05/08/2019

Smoothing Policies and Safe Policy Gradients

Policy gradient algorithms are among the best candidates for the much an...
09/23/2021

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning is a decentralized paradi...
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
10/22/2020

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.