Reinforcement Learning in Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence of Policy Optimization

11/29/2020
by   Masoud Roudneshin, et al.
0

We study model-based and model-free policy optimization in a class of nonzero-sum stochastic dynamic games called linear quadratic (LQ) deep structured games. In such games, players interact with each other through a set of weighted averages (linear regressions) of the states and actions. In this paper, we focus our attention to homogeneous weights; however, for the special case of infinite population, the obtained results extend to asymptotically vanishing weights wherein the players learn the sequential weighted mean-field equilibrium. Despite the non-convexity of the optimization in policy space and the fact that policy optimization does not generally converge in game setting, we prove that the proposed model-based and model-free policy gradient descent and natural policy gradient descent algorithms globally converge to the sub-game perfect Nash equilibrium. To the best of our knowledge, this is the first result that provides a global convergence proof of policy optimization in a nonzero-sum LQ game. One of the salient features of the proposed algorithms is that their parameter space is independent of the number of players, and when the dimension of state space is significantly larger than that of the action space, they provide a more efficient way of computation compared to those algorithms that plan and learn in the action space. Finally, some simulations are provided to numerically verify the obtained theoretical results.

READ FULL TEXT
research
11/29/2020

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-...
research
09/02/2020

Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynam...
research
09/01/2020

Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions and Policy Optimization

In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynam...
research
07/27/2021

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

We consider a general-sum N-player linear-quadratic game with stochastic...
research
03/17/2023

Estimation of Unknown Payoff Parameters in Large Network Games

We consider network games where a large number of agents interact accord...
research
04/26/2021

Computational Performance of Deep Reinforcement Learning to find Nash Equilibria

We test the performance of deep deterministic policy gradient (DDPG), a ...
research
06/26/2011

Learning to Coordinate Efficiently: A Model-based Approach

In common-interest stochastic games all players receive an identical pay...

Please sign up or login with your details

Forgot password? Click here to reset