Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence

by   Runyu Zhang, et al.

We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivalent in this setting, and give a non-asymptotic global convergence rate analysis to an ϵ-NE for a subclass of multi-agent MDPs called Markov potential games, which includes the cooperative setting with identical rewards among agents as an important special case. Our result shows that the number of iterations to reach an ϵ-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered. For Markov potential games, we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points. We also give a local convergence rate around strict NEs for more general settings.



There are no comments yet.


page 1

page 2

page 3

page 4


Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Potential games are arguably one of the most important and widely studie...

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...

Scalable Planning in Multi-Agent MDPs

Multi-agent Markov Decision Processes (MMDPs) arise in a variety of appl...

Decentralised Learning in Systems with Many, Many Strategic Agents

Although multi-agent reinforcement learning can tackle systems of strate...

Rational Verification for Probabilistic Systems

Rational verification is the problem of determining which temporal logic...

Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure

This paper considers a multi-agent Markov Decision Process (MDP), where ...

Distributed TD(0) with Almost No Communication

We provide a new non-asymptotic analysis of distributed TD(0) with linea...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.