Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

06/08/2022
by   Yuling Yan, et al.
4

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a γ-discounted infinite-horizon Markov game with S states, where the max-player has A actions and the min-player has B actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds – called VI-LCB-Game – that provably finds an ε-approximate Nash equilibrium with a sample complexity no larger than C_𝖼𝗅𝗂𝗉𝗉𝖾𝖽^⋆S(A+B)/(1-γ)^3ε^2 (up to some log factor). Here, C_𝖼𝗅𝗂𝗉𝗉𝖾𝖽^⋆ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within (0,1/1-γ]. Our sample complexity bound strengthens prior art by a factor of min{A,B}, achieving minimax optimality for the entire ε-range. An appealing feature of our result lies in algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2020

Near-Optimal Reinforcement Learning with Self-Play

This paper considers the problem of designing optimal algorithms for rei...
research
08/22/2022

Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

This paper studies multi-agent reinforcement learning in Markov games, w...
research
01/10/2022

When is Offline Two-Player Zero-Sum Markov Game Solvable?

We study what dataset assumption permits solving offline two-player zero...
research
08/17/2023

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

The problem of two-player zero-sum Markov games has recently attracted i...
research
07/30/2021

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games wit...
research
06/09/2023

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

The objective of this paper is to investigate the finite-time analysis o...
research
06/02/2019

Feature-Based Q-Learning for Two-Player Stochastic Games

Consider a two-player zero-sum stochastic game where the transition func...

Please sign up or login with your details

Forgot password? Click here to reset