Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

07/27/2021
by   Ben Hambly, et al.
0

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2019

Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings

We show by counterexample that policy-gradient algorithms have no guaran...
research
10/17/2022

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Learning in stochastic games is a notoriously difficult problem because,...
research
11/20/2020

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

We explore reinforcement learning methods for finding the optimal policy...
research
06/01/2023

Extragradient SVRG for Variational Inequalities: Error Bounds and Increasing Iterate Averaging

We study variance reduction methods for extragradient (EG) algorithms fo...
research
01/11/2021

Independent Policy Gradient Methods for Competitive Reinforcement Learning

We obtain global, non-asymptotic convergence guarantees for independent ...
research
05/26/2022

PixelGame: Infrared small target segmentation as a Nash equilibrium

A key challenge of infrared small target segmentation (ISTS) is to balan...
research
11/29/2020

Reinforcement Learning in Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence of Policy Optimization

We study model-based and model-free policy optimization in a class of no...

Please sign up or login with your details

Forgot password? Click here to reset