Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

by   Seyed Mohammad Asghari, et al.

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in T regret, where T is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within O(√(T)) of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a Õ(√(T)) regret bound. (Here Õ(·) hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.


page 21

page 22


Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...

Federated LQR: Learning through Sharing

In many multi-agent reinforcement learning applications such as flocking...

A General Auxiliary Controller for Multi-agent Flocking

We aim to improve the performance of multi-agent flocking behavior by qu...

A Regret Minimization Approach to Multi-Agent Control

We study the problem of multi-agent control of a dynamical system with k...

A Deep Q-Network for the Beer Game with Partial Information

The beer game is a decentralized, multi-agent, cooperative problem that ...

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

We present an approach for the quantification of the usefulness of trans...

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...