Actor-Critic learning for mean-field control in continuous time

03/13/2023
by   Noufel Frikha, et al.
0

We study policy gradient for mean-field control in continuous time in a reinforcement learning setting. By considering randomised policies with entropy regularisation, we derive a gradient expectation representation of the value function, which is amenable to actor-critic type algorithms, where the value functions and the policies are learnt alternately based on observation samples of the state and model-free estimation of the population state distribution, either by offline or online learning. In the linear-quadratic mean-field framework, we obtain an exact parametrisation of the actor and critic functions defined on the Wasserstein space. Finally, we illustrate the results of our algorithms with some numerical experiments on concrete examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2023

Actor critic learning algorithms for mean-field control with moment neural networks

We develop a new policy gradient and actor-critic algorithm for solving ...
research
11/22/2021

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous t...
research
06/28/2023

Continuous-Time q-learning for McKean-Vlasov Control Problems

This paper studies the q-learning, recently coined as the continuous-tim...
research
10/28/2019

Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

We develop a general reinforcement learning framework for mean field con...
research
02/22/2021

Actor-Critic Method for High Dimensional Static Hamilton–Jacobi–Bellman Partial Differential Equations based on Neural Networks

We propose a novel numerical method for high dimensional Hamilton–Jacobi...
research
06/09/2020

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

Entropy is ubiquitous in machine learning, but it is in general intracta...
research
12/27/2021

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Actor-critic (AC) algorithms, empowered by neural networks, have had sig...

Please sign up or login with your details

Forgot password? Click here to reset