Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games

02/17/2021
by   Yulai Zhao, et al.
11

Policy gradient methods are widely used in solving two-player zero-sum games to achieve superhuman performance in practice. However, it remains elusive when they can provably find a near-optimal solution and how many samples and iterations are needed. The current paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games where function approximation is used for generalization across states. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. To our knowledge, this is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2022

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Multi-agent interactions are increasingly important in the context of re...
research
10/17/2022

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Learning in stochastic games is a notoriously difficult problem because,...
research
05/12/2021

Identity Concealment Games: How I Learned to Stop Revealing and Love the Coincidences

In an adversarial environment, a hostile player performing a task may be...
research
07/04/2020

Off-Policy Exploitability-Evaluation and Equilibrium-Learning in Two-Player Zero-Sum Markov Games

Off-policy evaluation (OPE) is the problem of evaluating new policies us...
research
06/06/2022

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

This paper studies policy optimization algorithms for multi-agent reinfo...
research
07/22/2021

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

We introduce a contractive abstract dynamic programming framework and re...
research
12/12/2021

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

We analyze the convergence properties of the two-timescale fictitious pl...

Please sign up or login with your details

Forgot password? Click here to reset