Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

10/09/2019
by   René Carmona, et al.
0

We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2020

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

Reinforcement learning is a powerful tool to learn the optimal policy of...
research
04/30/2021

Discrete-Time Mean Field Control with Environment States

Multi-agent reinforcement learning methods have shown remarkable potenti...
research
10/28/2019

Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

We develop a general reinforcement learning framework for mean field con...
research
11/30/2021

Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control

Owing to the growth of interest in Reinforcement Learning in the last fe...
research
04/04/2023

Regularization of the policy updates for stabilizing Mean Field Games

This work studies non-cooperative Multi-Agent Reinforcement Learning (MA...
research
05/18/2023

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

In this paper, we study the statistical efficiency of Reinforcement Lear...
research
11/08/2017

Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations

We consider the problem of representing a large population's behavior po...

Please sign up or login with your details

Forgot password? Click here to reset