Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

10/23/2022
by   Sarath Pattathil, et al.
0

Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games, with global last-iterate parameter convergence guarantees. We also generalize the results to certain function approximation settings. Note that in our algorithms, the agents take symmetric roles. Our results might also be of independent interest for solving nonconvex-nonconcave minimax optimization problems with certain structures. Simulations are also provided to corroborate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2019

Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings

We show by counterexample that policy-gradient algorithms have no guaran...
research
02/17/2021

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games

Policy gradient methods are widely used in solving two-player zero-sum g...
research
03/07/2019

Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games

Learning in a multi-agent system is challenging because agents are simul...
research
07/15/2020

Newton-based Policy Optimization for Games

Many learning problems involve multiple agents optimizing different inte...
research
05/26/2020

On the Impossibility of Global Convergence in Multi-Loss Optimization

Under mild regularity conditions, gradient-based methods converge global...
research
10/20/2021

Independent Natural Policy Gradient Always Converges in Markov Potential Games

Multi-agent reinforcement learning has been successfully applied to full...
research
02/22/2018

Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

Asynchronous stochastic approximations are an important class of model-f...

Please sign up or login with your details

Forgot password? Click here to reset