Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

06/02/2022
by   Qinghua Liu, et al.
0

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs – weakly revealing POMGs – in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies. To our best knowledge, this work provides the first line of sample-efficient results for learning POMGs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...
research
08/16/2023

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

We study provable multi-agent reinforcement learning (MARL) in the gener...
research
08/22/2022

Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

This paper studies multi-agent reinforcement learning in Markov games, w...
research
10/12/2021

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) algorithms often suffer from a...
research
10/27/2021

V-Learning – A Simple, Efficient, Decentralized Algorithm for Multiagent RL

A major challenge of multiagent reinforcement learning (MARL) is the cur...
research
07/06/2023

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

This paper studies the sample-efficiency of learning in Partially Observ...
research
10/19/2022

Oracles Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Stackelberg equilibria arise naturally in a range of popular learning pr...

Please sign up or login with your details

Forgot password? Click here to reset