Near-Optimal Learning of Extensive-Form Games with Imperfect Information

by   Yu Bai, et al.

This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only π’ͺ((XA+YB)/Ξ΅^2) episodes of play to find an Ξ΅-approximate Nash equilibrium in two-player zero-sum games, where X,Y are the number of information sets and A,B are the number of actions for the two players. This improves upon the best known sample complexity of π’ͺ((X^2A+Y^2B)/Ξ΅^2) by a factor of π’ͺ(max{X, Y}), and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating balanced exploration policies into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.


page 1

page 2

page 3

page 4

βˆ™ 06/22/2020

Near-Optimal Reinforcement Learning with Self-Play

This paper considers the problem of designing optimal algorithms for rei...
βˆ™ 03/13/2019

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

In this paper, we present exploitability descent, a new algorithm to com...
βˆ™ 05/15/2022

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model...
βˆ™ 05/30/2022

Efficient Ξ¦-Regret Minimization in Extensive-Form Games via Online Mirror Descent

A conceptually appealing approach for learning Extensive-Form Games (EFG...
βˆ™ 10/08/2021

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Multi-agent reinforcement learning has made substantial empirical progre...
βˆ™ 10/20/2022

Learning Rationalizable Equilibria in Multiplayer Games

A natural goal in multiagent learning besides finding equilibria is to l...
βˆ™ 09/01/2023

Local and adaptive mirror descents in extensive-form games

We study how to learn Ο΅-optimal strategies in zero-sum imperfect informa...

Please sign up or login with your details

Forgot password? Click here to reset