DeepAI AI Chat
Log In Sign Up

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

10/21/2018
by   Sriram Srinivasan, et al.
8

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero sum games, without any domain-specific state space reductions.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...
12/11/2020

OPAC: Opportunistic Actor-Critic

Actor-critic methods, a type of model-free reinforcement learning (RL), ...
05/25/2021

Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning

In partially observable reinforcement learning, offline training gives a...
06/05/2022

ARC – Actor Residual Critic for Adversarial Imitation Learning

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-...
01/08/2014

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

We consider the problem of finding stationary Nash equilibria (NE) in a ...
03/02/2020

Gaussian Process Policy Optimization

We propose a novel actor-critic, model-free reinforcement learning algor...
07/23/2018

Learning to Play Pong using Policy Gradient Learning

Activities in reinforcement learning (RL) revolve around learning the Ma...