Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

by   Xiaoxiao Zhao, et al.

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the non-stationary problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the sample and communication complexity to obtain an ϵ-approximate stationary point. Numerical experiments on the control problem in MARL are performed to validate the effectiveness of the proposed algorithm.



page 1


MDPGT: Momentum-based Decentralized Policy Gradient Tracking

We propose a novel policy gradient method for multi-agent reinforcement ...

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

In this paper, we propose a distributed zeroth-order policy optimization...

Communication-Efficient Distributed Reinforcement Learning

This paper studies the distributed reinforcement learning (DRL) problem ...

Smoothing Policies and Safe Policy Gradients

Policy gradient algorithms are among the best candidates for the much an...

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning is a decentralized paradi...

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.