NeuPL: Neural Population Learning

02/15/2022
by   Siqi Liu, et al.
0

Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained good-responses populating the population; b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild assumptions. By representing a population of policies within a single conditional model, NeuPL enables transfer learning across policies. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.

READ FULL TEXT

page 9

page 13

page 15

research
07/13/2022

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) ...
research
05/31/2022

Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Learning to play optimally against any mixture over a diverse set of str...
research
02/07/2023

Population-size-Aware Policy Optimization for Mean-Field Games

In this work, we attempt to bridge the two fields of finite-agent and in...
research
06/03/2021

Iterative Empirical Game Solving via Single Policy Best Response

Policy-Space Response Oracles (PSRO) is a general algorithmic framework ...
research
12/23/2021

Continual Depth-limited Responses for Computing Counter-strategies in Extensive-form Games

In real-world applications, game-theoretic algorithms often interact wit...
research
04/20/2020

Approximate exploitability: Learning a best response in large games

A common metric in games of imperfect information is exploitability, i.e...

Please sign up or login with your details

Forgot password? Click here to reset