
OffPolicy ExploitabilityEvaluation and EquilibriumLearning in TwoPlayer ZeroSum Markov Games
Offpolicy evaluation (OPE) is the problem of evaluating new policies us...
read it

Policy Gradient Methods Find the Nash Equilibrium in Nplayer Generalsum Linearquadratic Games
We consider a generalsum Nplayer linearquadratic game with stochastic...
read it

Identity Concealment Games: How I Learned to Stop Revealing and Love the Coincidences
In an adversarial environment, a hostile player performing a task may be...
read it

Distributed Asynchronous Policy Iteration for Sequential ZeroSum Games and Minimax Control
We introduce a contractive abstract dynamic programming framework and re...
read it

Independent Policy Gradient Methods for Competitive Reinforcement Learning
We obtain global, nonasymptotic convergence guarantees for independent ...
read it

Towards General Function Approximation in ZeroSum Markov Games
This paper considers twoplayer zerosum finitehorizon Markov games wit...
read it

Practical Scalability for Stackelberg Security Games
Stackelberg Security Games (SSGs) have been adopted widely for modeling ...
read it
Provably Efficient Policy Gradient Methods for TwoPlayer ZeroSum Markov Games
Policy gradient methods are widely used in solving twoplayer zerosum games to achieve superhuman performance in practice. However, it remains elusive when they can provably find a nearoptimal solution and how many samples and iterations are needed. The current paper studies natural extensions of Natural Policy Gradient algorithm for solving twoplayer zerosum games where function approximation is used for generalization across states. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. To our knowledge, this is the first quantitative analysis of policy gradient methods with function approximation for twoplayer zerosum Markov games.
READ FULL TEXT
Comments
There are no comments yet.