Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence

06/01/2021
by   Runyu Zhang, et al.
0

We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivalent in this setting, and give a non-asymptotic global convergence rate analysis to an ϵ-NE for a subclass of multi-agent MDPs called Markov potential games, which includes the cooperative setting with identical rewards among agents as an important special case. Our result shows that the number of iterations to reach an ϵ-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered. For Markov potential games, we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. We also give a local convergence rate around strict NEs for more general settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2021

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Potential games are arguably one of the most important and widely studie...
research
12/15/2021

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...
research
03/13/2018

Decentralised Learning in Systems with Many, Many Strategic Agents

Although multi-agent reinforcement learning can tackle systems of strate...
research
12/20/2022

Anticipatory Fictitious Play

Fictitious play is an algorithm for computing Nash equilibria of matrix ...
research
02/02/2022

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Softmax policy gradient is a popular algorithm for policy optimization i...
research
04/12/2022

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

A major challenge in multi-agent systems is that the system complexity g...
research
09/15/2019

Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure

This paper considers a multi-agent Markov Decision Process (MDP), where ...

Please sign up or login with your details

Forgot password? Click here to reset