Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

08/22/2022
by   Gen Li, et al.
3

This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called  and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an ε-approximate CCE in a general-sum Markov game using O( H^4 S ∑_i=1^m A_i/ε^2) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A_i denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when the number of players is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an ε-approximate Nash equilibrium with minimal samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2022

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

This paper makes progress towards learning Nash equilibria in two-player...
research
10/08/2021

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Multi-agent reinforcement learning has made substantial empirical progre...
research
06/12/2023

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

We investigate learning the equilibria in non-stationary multi-agent sys...
research
05/29/2019

Extra-gradient with player sampling for provable fast convergence in n-player games

Data-driven model training is increasingly relying on finding Nash equil...
research
06/02/2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

This paper considers the challenging tasks of Multi-Agent Reinforcement ...
research
08/03/2022

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Computing Nash equilibrium policies is a central problem in multi-agent ...
research
03/25/2014

Multi-agent Inverse Reinforcement Learning for Zero-sum Games

In this paper we introduce a Bayesian framework for solving a class of p...

Please sign up or login with your details

Forgot password? Click here to reset