-
Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings
We show by counterexample that policy-gradient algorithms have no guaran...
read it
-
Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning
Nash equilibrium has long been a desired solution concept in multi-playe...
read it
-
Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings
Considering a class of gradient-based multi-agent learning algorithms in...
read it
-
Stable Opponent Shaping in Differentiable Games
A growing number of learning methods are actually games which optimise m...
read it
-
Multi-Agent Generalized Recursive Reasoning
We propose a new reasoning protocol called generalized recursive reasoni...
read it
-
Newton-type Methods for Minimax Optimization
Differential games, in particular two-player sequential games (a.k.a. mi...
read it
-
Multi-Agent Learning in Network Zero-Sum Games is a Hamiltonian System
Zero-sum games are natural, if informal, analogues of closed physical sy...
read it
Newton-based Policy Optimization for Games
Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component. This method ensures quadratic convergence in purely irrotational systems and pure solenoidal systems. Furthermore, we show that NOHD is attracted to stable fixed points in general multi-agent systems and repelled by strict saddle ones. Finally, we empirically compare the NOHD's performance with that of state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment.
READ FULL TEXT
Comments
There are no comments yet.