Latent Interactive A2C for Improved RL in Open Many-Agent Systems

05/09/2023
by   Keyang He, et al.
0

There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. But, these methods involve obtaining various types of information from the other agents, which may not be feasible in competitive or adversarial settings. A recent method, the interactive advantage actor critic (IA2C), engages in decentralized training coupled with decentralized execution, aiming to predict the other agents' actions from possibly noisy observations. In this paper, we present the latent IA2C that utilizes an encoder-decoder architecture to learn a latent representation of the hidden state and other agents' actions. Our experiments in two domains – each populated by many agents – reveal that the latent IA2C significantly improves sample efficiency by reducing variance and converging faster. Additionally, we introduce open versions of these domains where the agent population may change over time, and evaluate on these instances as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Decentralized Multi-Agent Actor-Critic with Generative Inference

Recent multi-agent actor-critic methods have utilized centralized traini...
research
09/20/2022

Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning

Synchronizing decisions across multiple agents in realistic settings is ...
research
02/08/2021

Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

Centralized Training for Decentralized Execution, where agents are train...
research
10/31/2017

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method --- adversarial advantage actor-critic ...
research
01/03/2022

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Centralized Training for Decentralized Execution, where training is done...
research
07/15/2019

On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems

While many multiagent algorithms are designed for homogeneous systems (i...
research
03/22/2017

Independently Controllable Features

Finding features that disentangle the different causes of variation in r...

Please sign up or login with your details

Forgot password? Click here to reset