Continuous-Time q-learning for McKean-Vlasov Control Problems

06/28/2023
by   Xiaoli Wei, et al.
0

This paper studies the q-learning, recently coined as the continuous-time counterpart of Q-learning by Jia and Zhou (2022c), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2022c), the mean-field interaction of agents render the definition of q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023) that can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by q_e) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition of the integrated q-function and our proposed searching method of test policies, some model-free offline and online learning algorithms are devised. In two financial applications, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the value function and two q-functions and illustrate our algorithms with simulation experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2023

Actor-Critic learning for mean-field control in continuous time

We study policy gradient for mean-field control in continuous time in a ...
research
07/02/2022

q-Learning in Continuous Time

We study the continuous-time counterpart of Q-learning for reinforcement...
research
09/08/2023

Actor critic learning algorithms for mean-field control with moment neural networks

We develop a new policy gradient and actor-critic algorithm for solving ...
research
10/11/2014

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their conver...
research
06/20/2021

Optimal Strategies for Decision Theoretic Online Learning

We extend the drifting games analysis to continuous time and show that t...
research
08/15/2021

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the a...
research
10/30/2015

Learning Continuous Control Policies by Stochastic Value Gradients

We present a unified framework for learning continuous control policies ...

Please sign up or login with your details

Forgot password? Click here to reset