ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

06/08/2022
by   Stephen McAleer, et al.
5

Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR). In this paper we propose an unbiased model-free method that does not require any importance sampling. Our method, ESCHER, is principled and is guaranteed to converge to an approximate Nash equilibrium with high probability in the tabular case. We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function. We then show that a deep learning version of ESCHER outperforms the prior state of the art – DREAM and neural fictitious self play (NFSP) – and the difference becomes dramatic as game size increases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...
research
04/10/2019

Solving Dynamic Discrete Choice Models Using Smoothing and Sieve Methods

We propose to combine smoothing, simulations and sieve approximations to...
research
12/03/2020

Model-free Neural Counterfactual Regret Minimization with Bootstrap Learning

Counterfactual Regret Minimization (CFR) has achieved many fascinating r...
research
11/15/2022

Model free Shapley values for high dimensional data

A model-agnostic variable importance method can be used with arbitrary p...
research
10/18/2020

Visibility Optimization for Surveillance-Evasion Games

We consider surveillance-evasion differential games, where a pursuer mus...
research
11/28/2014

Solving Games with Functional Regret Estimation

We propose a novel online learning method for minimizing regret in large...
research
09/09/2018

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Learning strategies for imperfect information games from samples of inte...

Please sign up or login with your details

Forgot password? Click here to reset