Independent Policy Gradient Methods for Competitive Reinforcement Learning

01/11/2021
by   Constantinos Daskalakis, et al.
0

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

We examine global non-asymptotic convergence properties of policy gradie...
research
10/17/2022

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Learning in stochastic games is a notoriously difficult problem because,...
research
07/27/2021

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

We consider a general-sum N-player linear-quadratic game with stochastic...
research
01/15/2022

Block Policy Mirror Descent

In this paper, we present a new class of policy gradient (PG) methods, n...
research
05/26/2023

A Policy Gradient Method for Confounded POMDPs

In this paper, we propose a policy gradient method for confounded partia...
research
11/16/2017

Hindsight policy gradients

Goal-conditional policies allow reinforcement learning agents to pursue ...
research
03/07/2020

Convergence of Q-value in case of Gaussian rewards

In this paper, as a study of reinforcement learning, we converge the Q f...

Please sign up or login with your details

Forgot password? Click here to reset