Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

05/31/2022
by   Siqi Liu, et al.
0

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.

READ FULL TEXT
research
07/13/2022

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) ...
research
02/15/2022

NeuPL: Neural Population Learning

Learning in strategy games (e.g. StarCraft, poker) requires the discover...
research
03/18/2021

Maximum Entropy Reinforcement Learning with Mixture Policies

Mixture models are an expressive hypothesis class that can approximate a...
research
01/23/2019

Open-ended Learning in Symmetric Zero-sum Games

Zero-sum games such as chess and poker are, abstractly, functions that e...
research
09/20/2021

Generalization in Mean Field Games by Learning Master Policies

Mean Field Games (MFGs) can potentially scale multi-agent systems to ext...
research
09/16/2016

A Formal Solution to the Grain of Truth Problem

A Bayesian agent acting in a multi-agent environment learns to predict t...
research
09/12/2018

Bayes-ToMoP: A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents

Multiagent algorithms often aim to accurately predict the behaviors of o...

Please sign up or login with your details

Forgot password? Click here to reset