Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games

11/23/2020
by   Tyler Malloy, et al.
0

This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.

READ FULL TEXT

page 7

page 8

page 9

12/17/2020

MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning

Over recent years, deep reinforcement learning has shown strong successe...
05/01/2020

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

Smart modular freight containers – as propagated in the Physical Interne...
03/29/2021

Deep reinforcement learning of event-triggered communication and control for multi-agent cooperative transport

In this paper, we explore a multi-agent reinforcement learning approach ...
01/20/2021

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

Recent advances in multi-agent reinforcement learning have been largely ...
09/17/2021

APIA: An Architecture for Policy-Aware Intentional Agents

This paper introduces the APIA architecture for policy-aware intentional...
03/22/2022

Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi

In pursuit of enhanced multi-agent collaboration, we analyze several on-...
11/02/2020

Multi-Agent Reinforcement Learning for Persistent Monitoring

The Persistent Monitoring (PM) problem seeks to find a set of trajectori...