
Policy Optimization for LinearQuadratic ZeroSum MeanField Type Games
In this paper, zerosum meanfield type games (ZSMFTG) with linear dynam...
read it

Reinforcement Learning in Nonzerosum Linear Quadratic Deep Structured Games: Global Convergence of Policy Optimization
We study modelbased and modelfree policy optimization in a class of no...
read it

Global Convergence of Policy Gradient for LinearQuadratic MeanField Control/Game in Continuous Time
Reinforcement learning is a powerful tool to learn the optimal policy of...
read it

Policy Gradient Methods Find the Nash Equilibrium in Nplayer Generalsum Linearquadratic Games
We consider a generalsum Nplayer linearquadratic game with stochastic...
read it

Approximate Equilibrium Computation for DiscreteTime LinearQuadratic MeanField Games
While the topic of meanfield games (MFGs) has a relatively long history...
read it

SemiExplicit Solutions to some NonLinear NonQuadratic MeanFieldType Games: A Direct Method
This article examines the solvability of meanfieldtype game problems b...
read it

Performance Analysis of Trial and Error Algorithms
Modelfree decentralized optimizations and learning are receiving increa...
read it
LinearQuadratic ZeroSum MeanField Type Games: Optimality Conditions and Policy Optimization
In this paper, zerosum meanfield type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinitehorizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both openloop and closedloop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both modelbased and samplebased frameworks. In the modelbased case, the gradients are computed exactly using the model, whereas they are estimated using MonteCarlo simulations in the samplebased case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.
READ FULL TEXT
Comments
There are no comments yet.