Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

07/13/2022
by   Stephen McAleer, et al.
5

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the Double Oracle (DO) algorithm, such as Policy Space Response Oracles (PSRO) and Anytime PSRO (APSRO), iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce Self-Play PSRO (SP-PSRO), a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

XDO: A Double Oracle Algorithm for Extensive-Form Games

Policy Space Response Oracles (PSRO) is a deep reinforcement learning al...
research
06/03/2021

Iterative Empirical Game Solving via Single Policy Best Response

Policy-Space Response Oracles (PSRO) is a general algorithmic framework ...
research
02/15/2022

NeuPL: Neural Population Learning

Learning in strategy games (e.g. StarCraft, poker) requires the discover...
research
05/31/2022

Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Learning to play optimally against any mixture over a diverse set of str...
research
09/03/2020

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Learning classifier systems (LCSs) are population-based predictive syste...
research
05/19/2023

Learning Diverse Risk Preferences in Population-based Self-play

Among the great successes of Reinforcement Learning (RL), self-play algo...
research
12/17/2018

Malthusian Reinforcement Learning

Here we explore a new algorithmic framework for multi-agent reinforcemen...

Please sign up or login with your details

Forgot password? Click here to reset