Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

09/25/2021
by   Liyuan Zheng, et al.
0

The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient theorem for the refined update and provide a local convergence guarantee for the Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, extensive experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Alth...
research
10/07/2022

Learning Stackelberg Equilibria and Applications to Economic Design Games

We study the use of reinforcement learning to learn the optimal leader's...
research
06/10/2023

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Abstract – Deep Actor-Critic algorithms, which combine Actor-Critic with...
research
07/19/2022

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner i...
research
05/07/2020

Curious Hierarchical Actor-Critic Reinforcement Learning

Hierarchical abstraction and curiosity-driven exploration are two common...
research
01/08/2014

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

We consider the problem of finding stationary Nash equilibria (NE) in a ...
research
05/18/2022

A2C is a special case of PPO

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are ...

Please sign up or login with your details

Forgot password? Click here to reset