Thompson sampling for linear quadratic mean-field teams

11/09/2020
∙
by   Mukul Gagrani, et al.
∙
0
∙

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of |M| different types at time horizon T is 𝒊Ėƒ( |M|^1.5√(T)) irrespective of the total number of agents, where the 𝒊Ėƒ notation hides logarithmic factors in T. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 08/18/2021

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...
research
∙ 01/27/2020

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MA...
research
∙ 10/23/2021

Deep Structured Teams in Arbitrary-Size Linear Networks: Decentralized Estimation, Optimal Control and Separation Principle

In this article, we introduce decentralized Kalman filters for linear qu...
research
∙ 04/24/2020

Decentralized linear quadratic systems with major and minor agents and non-Gaussian noise

We consider a decentralized linear quadratic system with a major agent a...
research
∙ 10/04/2022

Robust feedback stabilization of interacting multi-agent systems under uncertainty

We consider control strategies for large-scale interacting agent systems...
research
∙ 10/06/2020

Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

In this paper, we consider Markov chain and linear quadratic models for ...
research
∙ 07/20/2017

Consistent Tomography under Partial Observations over Adaptive Networks

This work studies the problem of inferring whether an agent is directly ...

Please sign up or login with your details

Forgot password? Click here to reset