Introduction
Regression Regret-Matching
Extensive-form Games
Counterfactual Regret Minimization
Regression CFR
Relationship to Abstraction
Experimental Results
Leduc Hold’em
Features and Implementation
Experiments
Convergence
Exploitability
One-on-one Competitions
Future Work
Acknowledgments
References
-
[Awerbuch and Kleinberg2004]
Awerbuch, B., and Kleinberg, R.
2004.
Adaptive routing with end-to-end feedback: Distributed learning and
geometric approaches.
In
ACM Symposium on Theory of Computing (STOC)
. - [Bard et al.2013] Bard, N.; Johanson, M.; Burch, N.; and Bowling, M. 2013. Online implicit agent modelling. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
- [Gilpin et al.2007] Gilpin, A.; Hoda, S.; Peña, J.; and Sandholm, T. 2007. Gradient-based algorithms for finding Nash equilibria in extensive form games. In International Workshop on Internet and Network Economics (WINE).
-
[Gilpin, Sandholm, and
Sorensen2007]
Gilpin, A.; Sandholm, T.; and Sorensen, T.
2007.
Potential-aware automated abstraction of sequential games, and
holistic equilibrium analysis of texas hold’em poker.
In
AAAI Conference on Artificial Intelligence (AAAI)
. - [Hart and Mas-Colell2000] Hart, S., and Mas-Colell, A. 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.
- [Hazan et al.2006] Hazan, E.; Kalai, A.; Kale, S.; and Agarwal, A. 2006. Logarithmic regret algorithms for online convex optimization. In Conference on Learning Theory (COLT).
- [Johanson et al.2011] Johanson, M.; Bowling, M.; Waugh, K.; and Zinkevich, M. 2011. Accelerating best response calculation in large extensive games. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 258–265.
- [Johanson2007] Johanson, M. 2007. Robust strategies and counter-strategies: Building a champion level computer poker player. Master’s thesis, University of Alberta.
- [Johanson2013] Johanson, M. 2013. Measuring the size of large no-limit poker games. Technical Report TR13-01, Department of Computing Science, University of Alberta.
- [Lanctot et al.2009] Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M. 2009. Monte carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems (NIPS), 1078–1086.
-
[Osborne and Rubinstein1994]
Osborne, M., and Rubinstein, A.
1994.
A Course On Game Theory
. MIT Press. -
[Ross, Gordon, and Bagnell2011]
Ross, S.; Gordon, G. J.; and Bagnell, J. A.
2011.
A reduction of imitation learning and structured prediction to no-regret online learning.
In International Conference on Artificial Intelligence and Statistics (AISTATS). - [Shi and Littman2002] Shi, J., and Littman, M. 2002. Abstraction methods for game theoretic poker. In International Conference on Computers and Games (CG), CG ’00.
- [Southey et al.2005] Southey, F.; Bowling, M.; Larson, B.; Piccione, C.; Burch, N.; Billings, D.; and Rayner, C. 2005. Bayes’ bluff: Opponent modelling in poker. In Conference on Uncertainty in AI (UAI).
- [Southey, Hoehn, and Holte2009] Southey, F.; Hoehn, B.; and Holte, R. 2009. Effective short-term opponent exploitation in simplified poker. Machine Learning 74(2):159–189.
- [Waugh et al.2008] Waugh, K.; Schnizlein, D.; Bowling, M.; and Szafron, D. 2008. Abstraction pathologies in extensive games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
- [Zinkevich et al.2007] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2007. Regret minimization in games with incomplete information. Technical Report TR07-14, Department of Computing Science, University of Alberta.
- [Zinkevich et al.2008] Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C. 2008. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 905–912.