A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

06/12/2022
by   Samuel Sokota, et al.
14

Algorithms designed for single-agent reinforcement learning (RL) generally fail to converge to equilibria in two-player zero-sum (2p0s) games. Conversely, game-theoretic algorithms for approximating Nash and quantal response equilibria (QREs) in 2p0s games are not typically competitive for RL and can be difficult to scale. As a result, algorithms for these two cases are generally developed and evaluated separately. In this work, we show that a single algorithm – a simple extension to mirror descent with proximal regularization that we call magnetic mirror descent (MMD) – can produce strong results in both settings, despite their fundamental differences. From a theoretical standpoint, we prove that MMD converges linearly to QREs in extensive-form games – this is the first time linear convergence has been proven for a first order solver. Moreover, applied as a tabular Nash equilibrium solver via self-play, we show empirically that MMD produces results competitive with CFR in both normal-form and extensive-form games with full feedback (this is the first time that a standard RL algorithm has done so) and also that MMD empirically converges in black-box feedback settings. Furthermore, for single-agent deep RL, on a small collection of Atari and Mujoco games, we show that MMD can produce results competitive with those of PPO. Lastly, for multi-agent deep RL, we show MMD can outperform NFSP in 3x3 Abrupt Dark Hex.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2021

Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games

Two player zero sum simultaneous action games are common in video games,...
research
03/11/2021

XDO: A Double Oracle Algorithm for Extensive-Form Games

Policy Space Response Oracles (PSRO) is a deep reinforcement learning al...
research
03/03/2020

Robust Market Making via Adversarial Reinforcement Learning

We show that adversarial reinforcement learning (ARL) can be used to pro...
research
09/27/2019

A Generalized Training Approach for Multiagent Learning

This paper investigates a population-based training regime based on game...
research
02/07/2023

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

A natural solution concept for many multiagent settings is the Stackelbe...
research
08/21/2022

Last-Iterate Convergence with Full- and Noisy-Information Feedback in Two-Player Zero-Sum Games

The theory of learning in games is prominent in the AI community, motiva...
research
06/08/2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Recent advances in deep reinforcement learning (RL) have led to consider...

Please sign up or login with your details

Forgot password? Click here to reset