The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

06/07/2021
by   Chi Jin, et al.
0

Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy. While recent progresses in RL theory address a rich set of RL problems with general function approximation, such successes are mostly restricted to the single-agent setting. It remains elusive how to extend these results to multi-agent RL, especially due to the new challenges arising from its game-theoretical nature. This paper considers two-player zero-sum Markov Games (MGs). We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension – a new complexity measure adapted from its single-agent version (Jin et al., 2021). A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2023

Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation

A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the c...
research
12/27/2021

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

We study multi-player general-sum Markov games with one of the players d...
research
10/03/2022

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Multi-Agent Reinforcement Learning (MARL) – where multiple agents learn ...
research
03/01/2023

Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

Nash Q-learning may be considered one of the first and most known algori...
research
07/06/2021

A Unified Off-Policy Evaluation Approach for General Value Function

General Value Function (GVF) is a powerful tool to represent both the pr...
research
07/15/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Model-based reinforcement learning (RL), which finds an optimal policy u...
research
03/10/2018

Multi-Agent Submodular Optimization

Recent years have seen many algorithmic advances in the area of submodul...

Please sign up or login with your details

Forgot password? Click here to reset