-
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup c...
read it
-
Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents
This article reviews recent advances in multi-agent reinforcement learni...
read it
-
Regret-Based Multi-Agent Coordination with Uncertain Task Rewards
Many multi-agent coordination problems can be represented as DCOPs. Moti...
read it
-
Algorithm for Decentralized Cooperative Positioning of Multiple Autonomous Agents
One of the most essential prerequisites behind a successful task executi...
read it
-
Multi-agent systems and decentralized artificial superintelligence
Multi-agents systems communication is a technology, which provides a way...
read it
-
An Online Optimization Approach for Multi-Agent Tracking of Dynamic Parameters in the Presence of Adversarial Noise
This paper addresses tracking of a moving target in a multi-agent networ...
read it
-
Decentralized Multi-agent Plan Repair in Dynamic Environments
Achieving joint objectives by teams of cooperative planning agents requi...
read it
Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in T regret, where T is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within O(√(T)) of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a Õ(√(T)) regret bound. (Here Õ(·) hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.
READ FULL TEXT
share
Comments
There are no comments yet.