Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

06/11/2021
by   Tadashi Kozuno, et al.
0

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamic of the IIG is not known – we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order 1/√(T) where T is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

We introduce DREAM, a deep reinforcement learning algorithm that finds o...
research
03/05/2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...
research
02/19/2020

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

In this paper we investigate the Follow the Regularized Leader dynamics ...
research
12/23/2022

Adapting to game trees in zero-sum imperfect information games

Imperfect information games (IIG) are games in which each player only pa...
research
03/14/2018

Constructing Imperfect Recall Abstractions to Solve Large Extensive-Form Games

Extensive-form games are an important model of finite sequential interac...
research
05/27/2022

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

We study the problem of finding the Nash equilibrium in a two-player zer...
research
07/31/2023

Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

Coordinate descent methods are popular in machine learning and optimizat...

Please sign up or login with your details

Forgot password? Click here to reset