On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

09/07/2022
āˆ™
by   Washim Uddin Mondal, et al.
āˆ™
0
āˆ™

We show that in a cooperative N-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if |š’³|, |š’°| denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by š’Ŗ(e) where eā‰œ1/āˆš(N)[āˆš(|š’³|)+āˆš(|š’°|)]. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to š’Ŗ(e) where eā‰œ1/āˆš(N)āˆš(|š’³|). Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within š’Ŗ(max{e,Ļµ}) distance of the optimal policy, and the sample complexity to achieve such a local policy is š’Ŗ(Ļµ^-3), for any Ļµ>0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
āˆ™ 01/13/2023

Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

Mean Field Control (MFC) is a powerful approximation tool to solve large...
research
āˆ™ 10/22/2021

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

In tabular multi-agent reinforcement learning with average-cost criterio...
research
āˆ™ 04/24/2023

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

This work studies a multi-agent Markov decision process (MDP) that can u...
research
āˆ™ 11/30/2022

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

We study a multi-agent reinforcement learning (MARL) problem where the a...
research
āˆ™ 05/23/2018

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

We introduce reinforcement learning for heterogeneous teams in which rew...
research
āˆ™ 02/26/2019

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

We consider a settings of hierarchical reinforcement learning, in which ...
research
āˆ™ 05/04/2019

Pandora's Problem with Nonobligatory Inspection

Martin Weitzman's "Pandora's problem" furnishes the mathematical basis f...

Please sign up or login with your details

Forgot password? Click here to reset