Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

by   Alec Koppel, et al.

In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as most results yield convergence to stationarity for parameterized policies in large/possibly continuous spaces. To solidify the foundations of MARL, we build upon linear programming (LP) reformulations, for which stochastic primal-dual methods yields a model-free approach to achieve optimal sample complexity in the centralized case. We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization. Experiments corroborate these results in practice.


page 1

page 2

page 3

page 4


Voting-Based Multi-Agent Reinforcement Learning

The recent success of single-agent reinforcement learning (RL) encourage...

Multi-Agent MDP Homomorphic Networks

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of n...

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement lear...

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

We introduce reinforcement learning for heterogeneous teams in which rew...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Due to its high sample complexity, simulation is, as of today, critical ...

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Despite the success of single-agent reinforcement learning, multi-agent ...

A Reinforcement Learning Based Approach for Joint Multi-Agent Decision Making

Reinforcement Learning (RL) is being increasingly applied to optimize co...