Q-Learning for Mean-Field Controls

by   Haotian Gu, et al.

Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to N-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.


page 1

page 2

page 3

page 4


Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

We develop a general reinforcement learning framework for mean field con...

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

Multi-agent reinforcement learning has been successfully applied to a nu...

Reinforcement Learning for Mean Field Game

Stochastic games provide a framework for interactions among multi-agents...

Mean Field Behaviour of Collaborative Multi-Agent Foragers

Collaborative multi-agent robotic systems where agents coordinate by mod...

Game-theoretical control with continuous action sets

Motivated by the recent applications of game-theoretical learning techni...

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...

Mean Field Games of Controls: Finite Difference Approximations

We consider a class of mean field games in which the agents interact thr...