Independent Policy Gradient Methods for Competitive Reinforcement Learning

01/11/2021
by   Constantinos Daskalakis, et al.
0

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/14/2020

Multi-Agent Reinforcement Learning in Cournot Games

In this work, we study the interaction of strategic agents in continuous...
07/27/2021

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

We consider a general-sum N-player linear-quadratic game with stochastic...
01/15/2022

Block Policy Mirror Descent

In this paper, we present a new class of policy gradient (PG) methods, n...
06/18/2020

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision p...
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...
08/21/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

One practical requirement in solving dynamic games is to ensure that the...