Local and adaptive mirror descents in extensive-form games

09/01/2023
by   Côme Fiegel, et al.
0

We study how to learn ϵ-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by T. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of 𝒪̃(T^-1/2) with high probability and has a near-optimal dependence on the game parameters when applied with the best theoretical choices of learning rates and sampling policies. To achieve these results, we generalize the notion of OMD stabilization, allowing for time-varying regularization with convex increments.

READ FULL TEXT
research
12/23/2022

Adapting to game trees in zero-sum imperfect information games

Imperfect information games (IIG) are games in which each player only pa...
research
03/05/2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...
research
02/03/2022

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

This paper resolves the open question of designing near-optimal algorith...
research
08/29/2019

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

In this paper, we settle the sampling complexity of solving discounted t...
research
08/20/2022

Near-Optimal Φ-Regret Learning in Extensive-Form Games

In this paper, we establish efficient and uncoupled learning dynamics so...
research
06/12/2019

Estimation of the Shapley value by ergodic sampling

The idea of approximating the Shapley value of an n-person game by rando...
research
01/31/2022

L-SVRG and L-Katyusha with Adaptive Sampling

Stochastic gradient-based optimization methods, such as L-SVRG and its a...

Please sign up or login with your details

Forgot password? Click here to reset