Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

10/21/2018
by   Sriram Srinivasan, et al.
8

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero sum games, without any domain-specific state space reductions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...
research
12/11/2020

OPAC: Opportunistic Actor-Critic

Actor-critic methods, a type of model-free reinforcement learning (RL), ...
research
05/25/2021

Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning

In partially observable reinforcement learning, offline training gives a...
research
06/05/2022

ARC – Actor Residual Critic for Adversarial Imitation Learning

Adversarial Imitation Learning (AIL) is a class of popular state-of-the-...
research
01/08/2014

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

We consider the problem of finding stationary Nash equilibria (NE) in a ...
research
03/02/2020

Gaussian Process Policy Optimization

We propose a novel actor-critic, model-free reinforcement learning algor...
research
07/23/2018

Learning to Play Pong using Policy Gradient Learning

Activities in reinforcement learning (RL) revolve around learning the Ma...

Please sign up or login with your details

Forgot password? Click here to reset