
On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in ZeroSum Games
We propose a twotimescale algorithm for finding local Nash equilibria i...
read it

Stackelberg ActorCritic: GameTheoretic Reinforcement Learning Algorithms
The hierarchical interaction between the actor and critic in actorcriti...
read it

ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
read it

The Advantage RegretMatching ActorCritic
Regret minimization has played a key role in online learning, equilibriu...
read it

Lastiterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinitehorizon Competitive Markov Games
We study infinitehorizon discounted twoplayer zerosum Markov games, a...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

Deep Learning for PrincipalAgent Mean Field Games
Here, we develop a deep learning algorithm for solving PrincipalAgent (...
read it
ActorCritic Algorithms for Learning Nash Equilibria in Nplayer GeneralSum Games
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted generalsum stochastic game. We first generalize a nonlinear optimization problem from Filar and Vrieze [2004] to a Nplayer setting and break down this problem into simpler subproblems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these subproblems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SGSP (Stochastic Game  SubProblem) conditions. Using these conditions, we develop two actorcritic algorithms: OFFSGSP (modelbased) and ONSGSP (modelfree). Both algorithms use a critic that estimates the value function for a fixed policy and an actor that performs descent in the policy space using a descent direction that avoids local minima. We establish that both algorithms converge, in selfplay, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying generalsum stochastic game. On a single state nongeneric game (see Hart and MasColell [2005]) as well as on a synthetic twoplayer game setup with 810,000 states, we establish that ONSGSP consistently outperforms NashQ ([Hu and Wellman, 2003] and FFQ [Littman, 2001] algorithms.
READ FULL TEXT
Comments
There are no comments yet.