Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

01/27/2020
by   Seyed Mohammad Asghari, et al.
53

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in T regret, where T is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within O(√(T)) of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a Õ(√(T)) regret bound. (Here Õ(·) hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

READ FULL TEXT

page 21

page 22

research
11/09/2020

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...
research
01/26/2023

Multi-Agent Congestion Cost Minimization With Linear Function Approximations

This work considers multiple agents traversing a network from a source n...
research
05/17/2023

Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning

The discovery of individual objectives in collective behavior of complex...
research
11/03/2020

Federated LQR: Learning through Sharing

In many multi-agent reinforcement learning applications such as flocking...
research
12/11/2021

A General Auxiliary Controller for Multi-agent Flocking

We aim to improve the performance of multi-agent flocking behavior by qu...
research
02/13/2012

Decentralized Multi-agent Plan Repair in Dynamic Environments

Achieving joint objectives by teams of cooperative planning agents requi...
research
02/27/2023

Safe Multi-agent Learning via Trapping Regions

One of the main challenges of multi-agent learning lies in establishing ...

Please sign up or login with your details

Forgot password? Click here to reset