DeepAI

# Solving Games with Functional Regret Estimation

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

• 7 publications
• 14 publications
• 41 publications
• 39 publications
11/01/2018

### Deep Counterfactual Regret Minimization

Counterfactual Regret Minimization (CFR) is the leading algorithm for so...
09/10/2018

### Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...
08/10/2022

### Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

We consider learning Nash equilibria in two-player zero-sum Markov Games...
10/03/2019

### Bounds for Approximate Regret-Matching Algorithms

A dominant approach to solving large imperfect-information games is Coun...
04/22/2021

### Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Optimization of deep learning algorithms to approach Nash Equilibrium re...
06/08/2022

### ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

Recent techniques for approximating Nash equilibria in very large games ...
08/18/2019

### Geometrical Regret Matching of Mixed Strategies

We argue that the existing regret matchings for equilibrium approximatio...

## References

• [Awerbuch and Kleinberg2004] Awerbuch, B., and Kleinberg, R. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In

ACM Symposium on Theory of Computing (STOC)

.
• [Bard et al.2013] Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
• [Gilpin et al.2007] Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Internet and Network Economics (WINE).
• [Gilpin, Sandholm, and Sorensen2007] Gilpin, A.; Sandholm, T.; and Sorensen, T. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of texas hold’em poker. In

AAAI Conference on Artificial Intelligence (AAAI)

.
• [Hart and Mas-Colell2000] Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.
• [Hazan et al.2006] Hazan, E.; Kalai, A.; Kale, S.; and Agarwal, A. 2006. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory (COLT).
• [Johanson et al.2011] Johanson, M.; Bowling, M.; Waugh, K.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
• [Johanson2007] Johanson, M. 2007. Robust strategies and counter-strategies: Building a champion level computer poker player. Master’s thesis, University of Alberta.
• [Johanson2013] Johanson, M. 2013. Measuring the size of large no-limit poker games. Technical Report TR13-01, Department of Computing Science, University of Alberta.
• [Lanctot et al.2009] Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems (NIPS), 1078–1086.
• [Osborne and Rubinstein1994] Osborne, M., and Rubinstein, A. 1994.

A Course On Game Theory

.
MIT Press.
• [Ross, Gordon, and Bagnell2011] Ross, S.; Gordon, G. J.; and Bagnell, J. A. 2011.

A reduction of imitation learning and structured prediction to no-regret online learning.

In International Conference on Artificial Intelligence and Statistics (AISTATS).
• [Shi and Littman2002] Shi, J., and Littman, M. 2002. Abstraction methods for game theoretic poker. In International Conference on Computers and Games (CG), CG ’00.
• [Southey et al.2005] Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, N.; Billings, D.; and Rayner, C. 2005. Bayes’ bluff: Opponent modelling in poker. In Conference on Uncertainty in AI (UAI).
• [Southey, Hoehn, and Holte2009] Southey, F.; Hoehn, B.; and Holte, R. 2009. Effective short-term opponent exploitation in simplified poker. Machine Learning 74(2):159–189.
• [Waugh et al.2008] Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. 2008. Abstraction pathologies in extensive games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
• [Zinkevich et al.2007] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2007. Regret minimization in games with incomplete information. Technical Report TR07-14, Department of Computing Science, University of Alberta.
• [Zinkevich et al.2008] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 905–912.