Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

by   H. L Prasad, et al.

We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a N-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions. Using these conditions, we develop two actor-critic algorithms: OFF-SGSP (model-based) and ON-SGSP (model-free). Both algorithms use a critic that estimates the value function for a fixed policy and an actor that performs descent in the policy space using a descent direction that avoids local minima. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game (see Hart and Mas-Colell [2005]) as well as on a synthetic two-player game setup with 810,000 states, we establish that ON-SGSP consistently outperforms NashQ ([Hu and Wellman, 2003] and FFQ [Littman, 2001] algorithms.


page 1

page 2

page 3

page 4


On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games

We propose a two-timescale algorithm for finding local Nash equilibria i...

Multiplayer Performative Prediction: Learning in Decision-Dependent Games

Learning problems commonly exhibit an interesting feedback mechanism whe...

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) i...

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

We study infinite-horizon discounted two-player zero-sum Markov games, a...

A Generalized Training Approach for Multiagent Learning

This paper investigates a population-based training regime based on game...

Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

Many real-world multi-agent interactions consider multiple distinct crit...