Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

10/22/2021
by   Alec Koppel, et al.
0

In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as most results yield convergence to stationarity for parameterized policies in large/possibly continuous spaces. To solidify the foundations of MARL, we build upon linear programming (LP) reformulations, for which stochastic primal-dual methods yields a model-free approach to achieve optimal sample complexity in the centralized case. We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization. Experiments corroborate these results in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2022

On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

We show that in a cooperative N-agent network, one can design locally ex...
research
07/02/2019

Voting-Based Multi-Agent Reinforcement Learning

The recent success of single-agent reinforcement learning (RL) encourage...
research
10/09/2021

Multi-Agent MDP Homomorphic Networks

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of n...
research
06/03/2018

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Despite the success of single-agent reinforcement learning, multi-agent ...
research
02/27/2020

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement lear...
research
07/01/2022

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Due to its high sample complexity, simulation is, as of today, critical ...
research
05/23/2018

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

We introduce reinforcement learning for heterogeneous teams in which rew...

Please sign up or login with your details

Forgot password? Click here to reset