DeepAI
Log In Sign Up

Solving Games with Functional Regret Estimation

11/28/2014
by   Kevin Waugh, et al.
University of Alberta
Carnegie Mellon University
0

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/01/2018

Deep Counterfactual Regret Minimization

Counterfactual Regret Minimization (CFR) is the leading algorithm for so...
09/10/2018

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...
08/10/2022

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

We consider learning Nash equilibria in two-player zero-sum Markov Games...
10/03/2019

Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Coun...
04/22/2021

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Optimization of deep learning algorithms to approach Nash Equilibrium re...
06/08/2022

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

Recent techniques for approximating Nash equilibria in very large games ...
08/18/2019

Geometrical Regret Matching of Mixed Strategies

We argue that the existing regret matchings for equilibrium approximatio...

Introduction

Regression Regret-Matching

Extensive-form Games

Counterfactual Regret Minimization

Regression CFR

Relationship to Abstraction

Experimental Results

Leduc Hold’em

Features and Implementation

Experiments

Convergence

Exploitability

One-on-one Competitions

Future Work

Acknowledgments

References

  • [Awerbuch and Kleinberg2004] Awerbuch, B., and Kleinberg, R. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In

    ACM Symposium on Theory of Computing (STOC)

    .
  • [Bard et al.2013] Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Gilpin et al.2007] Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Internet and Network Economics (WINE).
  • [Gilpin, Sandholm, and Sorensen2007] Gilpin, A.; Sandholm, T.; and Sorensen, T. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of texas hold’em poker. In

    AAAI Conference on Artificial Intelligence (AAAI)

    .
  • [Hart and Mas-Colell2000] Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.
  • [Hazan et al.2006] Hazan, E.; Kalai, A.; Kale, S.; and Agarwal, A. 2006. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory (COLT).
  • [Johanson et al.2011] Johanson, M.; Bowling, M.; Waugh, K.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
  • [Johanson2007] Johanson, M. 2007. Robust strategies and counter-strategies: Building a champion level computer poker player. Master’s thesis, University of Alberta.
  • [Johanson2013] Johanson, M. 2013. Measuring the size of large no-limit poker games. Technical Report TR13-01, Department of Computing Science, University of Alberta.
  • [Lanctot et al.2009] Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems (NIPS), 1078–1086.
  • [Osborne and Rubinstein1994] Osborne, M., and Rubinstein, A. 1994.

    A Course On Game Theory

    .
    MIT Press.
  • [Ross, Gordon, and Bagnell2011] Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011.

    A reduction of imitation learning and structured prediction to no-regret online learning.

    In International Conference on Artificial Intelligence and Statistics (AISTATS).
  • [Shi and Littman2002] Shi, J., and Littman, M. 2002. Abstraction methods for game theoretic poker. In International Conference on Computers and Games (CG), CG ’00.
  • [Southey et al.2005] Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, N.; Billings, D.; and Rayner, C. 2005. Bayes’ bluff: Opponent modelling in poker. In Conference on Uncertainty in AI (UAI).
  • [Southey, Hoehn, and Holte2009] Southey, F.; Hoehn, B.; and Holte, R. 2009. Effective short-term opponent exploitation in simplified poker. Machine Learning 74(2):159–189.
  • [Waugh et al.2008] Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. 2008. Abstraction pathologies in extensive games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Zinkevich et al.2007] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2007. Regret minimization in games with incomplete information. Technical Report TR07-14, Department of Computing Science, University of Alberta.
  • [Zinkevich et al.2008] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 905–912.

Appendix