Solving Games with Functional Regret Estimation

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.



There are no comments yet.


page 1

page 2

page 3

page 4


Deep Counterfactual Regret Minimization

Counterfactual Regret Minimization (CFR) is the leading algorithm for so...

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...

Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Coun...

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Optimization of deep learning algorithms to approach Nash Equilibrium re...

Filtered Fictitious Play for Perturbed Observation Potential Games and Decentralised POMDPs

Potential games and decentralised partially observable MDPs (Dec-POMDPs)...

Policy Regret in Repeated Games

The notion of policy regret in online learning is a well defined? perfor...

Geometrical Regret Matching of Mixed Strategies

We argue that the existing regret matchings for equilibrium approximatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Regression Regret-Matching

Extensive-form Games

Counterfactual Regret Minimization

Regression CFR

Relationship to Abstraction

Experimental Results

Leduc Hold’em

Features and Implementation




One-on-one Competitions

Future Work



  • [Awerbuch and Kleinberg2004] Awerbuch, B., and Kleinberg, R. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In

    ACM Symposium on Theory of Computing (STOC)

  • [Bard et al.2013] Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Gilpin et al.2007] Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Internet and Network Economics (WINE).
  • [Gilpin, Sandholm, and Sorensen2007] Gilpin, A.; Sandholm, T.; and Sorensen, T. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of texas hold’em poker. In

    AAAI Conference on Artificial Intelligence (AAAI)

  • [Hart and Mas-Colell2000] Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.
  • [Hazan et al.2006] Hazan, E.; Kalai, A.; Kale, S.; and Agarwal, A. 2006. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory (COLT).
  • [Johanson et al.2011] Johanson, M.; Bowling, M.; Waugh, K.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
  • [Johanson2007] Johanson, M. 2007. Robust strategies and counter-strategies: Building a champion level computer poker player. Master’s thesis, University of Alberta.
  • [Johanson2013] Johanson, M. 2013. Measuring the size of large no-limit poker games. Technical Report TR13-01, Department of Computing Science, University of Alberta.
  • [Lanctot et al.2009] Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems (NIPS), 1078–1086.
  • [Osborne and Rubinstein1994] Osborne, M., and Rubinstein, A. 1994.

    A Course On Game Theory

    MIT Press.
  • [Ross, Gordon, and Bagnell2011] Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011.

    A reduction of imitation learning and structured prediction to no-regret online learning.

    In International Conference on Artificial Intelligence and Statistics (AISTATS).
  • [Shi and Littman2002] Shi, J., and Littman, M. 2002. Abstraction methods for game theoretic poker. In International Conference on Computers and Games (CG), CG ’00.
  • [Southey et al.2005] Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, N.; Billings, D.; and Rayner, C. 2005. Bayes’ bluff: Opponent modelling in poker. In Conference on Uncertainty in AI (UAI).
  • [Southey, Hoehn, and Holte2009] Southey, F.; Hoehn, B.; and Holte, R. 2009. Effective short-term opponent exploitation in simplified poker. Machine Learning 74(2):159–189.
  • [Waugh et al.2008] Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. 2008. Abstraction pathologies in extensive games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  • [Zinkevich et al.2007] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2007. Regret minimization in games with incomplete information. Technical Report TR07-14, Department of Computing Science, University of Alberta.
  • [Zinkevich et al.2008] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 905–912.