DeepAI AI Chat
Log In Sign Up

Neural Replicator Dynamics

by   Shayegan Omidshafiei, et al.

In multiagent learning, agents interact in inherently nonstationary environments due to their concurrent policy updates. It is, therefore, paramount to develop and analyze algorithms that learn effectively despite these nonstationarities. A number of works have successfully conducted this analysis under the lens of evolutionary game theory (EGT), wherein a population of individuals interact and evolve based on biologically-inspired operators. These studies have mainly focused on establishing connections to value-iteration based approaches in stateless or tabular games. We extend this line of inquiry to formally establish links between EGT and policy gradient (PG) methods, which have been extensively applied in single and multiagent learning. We pinpoint weaknesses of the commonly-used softmax PG algorithm in adversarial and nonstationary settings and contrast PG's behavior to that predicted by replicator dynamics (RD), a central model in EGT. We consequently provide theoretical results that establish links between EGT and PG methods, then derive Neural Replicator Dynamics (NeuRD), a parameterized version of RD that constitutes a novel method with several advantages. First, as NeuRD reduces to the well-studied no-regret Hedge algorithm in the tabular setting, it inherits no-regret guarantees that enable convergence to equilibria in games. Second, NeuRD is shown to be more adaptive to nonstationarity, in comparison to PG, when learning in canonical games and imperfect information benchmarks including Poker. Thirdly, modifying any PG-based algorithm to use the NeuRD update rule is straightforward and incurs no added computational costs. Finally, while single-agent learning is not the main focus of the paper, we verify empirically that NeuRD is competitive in these settings with a recent baseline algorithm.


page 1

page 2

page 3

page 4


Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

We study the performance of policy gradient methods for the subclass of ...

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...

Evolutionary Dynamics and Φ-Regret Minimization in Games

Regret has been established as a foundational concept in online learning...

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

An abundance of recent impossibility results establish that regret minim...

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision p...

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational...

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Multi-agent learning is intrinsically harder, more unstable and unpredic...

Code Repositories


Evaluating EGT and RL dynamics in 3 strategies norm games

view repo