Neural Replicator Dynamics

06/01/2019
by   Shayegan Omidshafiei, et al.
12

In multiagent learning, agents interact in inherently nonstationary environments due to their concurrent policy updates. It is, therefore, paramount to develop and analyze algorithms that learn effectively despite these nonstationarities. A number of works have successfully conducted this analysis under the lens of evolutionary game theory (EGT), wherein a population of individuals interact and evolve based on biologically-inspired operators. These studies have mainly focused on establishing connections to value-iteration based approaches in stateless or tabular games. We extend this line of inquiry to formally establish links between EGT and policy gradient (PG) methods, which have been extensively applied in single and multiagent learning. We pinpoint weaknesses of the commonly-used softmax PG algorithm in adversarial and nonstationary settings and contrast PG's behavior to that predicted by replicator dynamics (RD), a central model in EGT. We consequently provide theoretical results that establish links between EGT and PG methods, then derive Neural Replicator Dynamics (NeuRD), a parameterized version of RD that constitutes a novel method with several advantages. First, as NeuRD reduces to the well-studied no-regret Hedge algorithm in the tabular setting, it inherits no-regret guarantees that enable convergence to equilibria in games. Second, NeuRD is shown to be more adaptive to nonstationarity, in comparison to PG, when learning in canonical games and imperfect information benchmarks including Poker. Thirdly, modifying any PG-based algorithm to use the NeuRD update rule is straightforward and incurs no added computational costs. Finally, while single-agent learning is not the main focus of the paper, we verify empirically that NeuRD is competitive in these settings with a recent baseline algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

We study the performance of policy gradient methods for the subclass of ...
research
06/06/2022

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...
research
06/28/2021

Evolutionary Dynamics and Φ-Regret Minimization in Games

Regret has been established as a foundational concept in online learning...
research
07/28/2022

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

An abundance of recent impossibility results establish that regret minim...
research
06/18/2020

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision p...
research
06/04/2022

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational...
research
06/01/2023

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Multi-agent learning is intrinsically harder, more unstable and unpredic...

Please sign up or login with your details

Forgot password? Click here to reset