Log In Sign Up

Solving Games with Functional Regret Estimation

by   Kevin Waugh, et al.
University of Alberta
Carnegie Mellon University

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.


page 1

page 2

page 3

page 4


Deep Counterfactual Regret Minimization

Counterfactual Regret Minimization (CFR) is the leading algorithm for so...

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

We consider learning Nash equilibria in two-player zero-sum Markov Games...

Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Coun...

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Optimization of deep learning algorithms to approach Nash Equilibrium re...

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

Recent techniques for approximating Nash equilibria in very large games ...

Geometrical Regret Matching of Mixed Strategies

We argue that the existing regret matchings for equilibrium approximatio...


Regression Regret-Matching

Extensive-form Games

Counterfactual Regret Minimization

Regression CFR

Relationship to Abstraction

Experimental Results

Leduc Hold’em

Features and Implementation




One-on-one Competitions

Future Work



  • [Awerbuch and Kleinberg2004] Awerbuch, B., and Kleinberg, R. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In

    ACM Symposium on Theory of Computing (STOC)

  • [Bard et al.2013] Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Gilpin et al.2007] Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Internet and Network Economics (WINE).
  • [Gilpin, Sandholm, and Sorensen2007] Gilpin, A.; Sandholm, T.; and Sorensen, T. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of texas hold’em poker. In

    AAAI Conference on Artificial Intelligence (AAAI)

  • [Hart and Mas-Colell2000] Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.
  • [Hazan et al.2006] Hazan, E.; Kalai, A.; Kale, S.; and Agarwal, A. 2006. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory (COLT).
  • [Johanson et al.2011] Johanson, M.; Bowling, M.; Waugh, K.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
  • [Johanson2007] Johanson, M. 2007. Robust strategies and counter-strategies: Building a champion level computer poker player. Master’s thesis, University of Alberta.
  • [Johanson2013] Johanson, M. 2013. Measuring the size of large no-limit poker games. Technical Report TR13-01, Department of Computing Science, University of Alberta.
  • [Lanctot et al.2009] Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems (NIPS), 1078–1086.
  • [Osborne and Rubinstein1994] Osborne, M., and Rubinstein, A. 1994.

    A Course On Game Theory

    MIT Press.
  • [Ross, Gordon, and Bagnell2011] Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011.

    A reduction of imitation learning and structured prediction to no-regret online learning.

    In International Conference on Artificial Intelligence and Statistics (AISTATS).
  • [Shi and Littman2002] Shi, J., and Littman, M. 2002. Abstraction methods for game theoretic poker. In International Conference on Computers and Games (CG), CG ’00.
  • [Southey et al.2005] Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, N.; Billings, D.; and Rayner, C. 2005. Bayes’ bluff: Opponent modelling in poker. In Conference on Uncertainty in AI (UAI).
  • [Southey, Hoehn, and Holte2009] Southey, F.; Hoehn, B.; and Holte, R. 2009. Effective short-term opponent exploitation in simplified poker. Machine Learning 74(2):159–189.
  • [Waugh et al.2008] Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. 2008. Abstraction pathologies in extensive games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Zinkevich et al.2007] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2007. Regret minimization in games with incomplete information. Technical Report TR07-14, Department of Computing Science, University of Alberta.
  • [Zinkevich et al.2008] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 905–912.