Iterative Empirical Game Solving via Single Policy Best Response

06/03/2021
by   Max Olan Smith, et al.
0

Policy-Space Response Oracles (PSRO) is a general algorithmic framework for learning policies in multiagent systems by interleaving empirical game analysis with deep reinforcement learning (Deep RL). At each iteration, Deep RL is invoked to train a best response to a mixture of opponent policies. The repeated application of Deep RL poses an expensive computational burden as we look to apply this algorithm to more complex domains. We introduce two variations of PSRO designed to reduce the amount of simulation required during Deep RL training. Both algorithms modify how PSRO adds new policies to the empirical game, based on learned responses to a single opponent policy. The first, Mixed-Oracles, transfers knowledge from previous iterations of Deep RL, requiring training only against the opponent's newest policy. The second, Mixed-Opponents, constructs a pure-strategy opponent by mixing existing strategy's action-value estimates, instead of their policies. Learning against a single policy mitigates variance in state outcomes that is induced by an unobserved distribution of opponents. We empirically demonstrate that these algorithms substantially reduce the amount of simulation during training required by PSRO, while producing equivalent or better solutions to the game.

READ FULL TEXT

page 16

page 17

page 19

research
07/13/2022

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) ...
research
06/10/2020

Reinforcement Learning from a Mixture of Interpretable Experts

Reinforcement learning (RL) has demonstrated its ability to solve high d...
research
08/29/2021

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Although well-established in general reinforcement learning (RL), value-...
research
06/08/2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Recent advances in deep reinforcement learning (RL) have led to consider...
research
02/15/2022

NeuPL: Neural Population Learning

Learning in strategy games (e.g. StarCraft, poker) requires the discover...
research
09/29/2020

Learning to Play against Any Mixture of Opponents

Intuitively, experience playing against one mixture of opponents in a gi...
research
06/30/2023

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep...

Please sign up or login with your details

Forgot password? Click here to reset