Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

09/13/2018
by   Gregory Palmer, et al.
0

In Multi-Agent Reinforcement Learning, independent cooperative learners must overcome a number of pathologies in order to learn optimal joint policies. These pathologies include action-shadowing, stochasticity, the moving target and alter-exploration problems (Matignon, Laurent, and Le Fort-Piat 2012; Wei and Luke 2016). Numerous methods have been proposed to address these pathologies, but evaluations are predominately conducted in repeated strategic-form games and stochastic games consisting of only a small number of state transitions. This raises the question of the scalability of the methods to complex, temporally extended, partially observable domains with stochastic transitions and rewards. In this paper we study such complex settings, which require reasoning over long time horizons and confront agents with the curse of dimensionality. To deal with the dimensionality, we adopt a Multi-Agent Deep Reinforcement Learning (MA-DRL) approach. We find that when the agents have to make critical decisions in seclusion, existing methods succumb to a combination of relative overgeneralisation (a type of action shadowing), the alter-exploration problem, and the stochasticity. To address these pathologies we introduce expanding negative update intervals that enable independent learners to establish the near-optimal average utility values for higher-level strategies while largely discarding transitions from episodes that result in mis-coordination. We evaluate Negative Update Intervals Double-DQN (NUI-DDQN) within a temporally extended Climb Game, a normal form game which has frequently been used to study relative over-generalisation and other pathologies. We show that NUI-DDQN can converge towards optimal joint-policies in deterministic and stochastic reward settings, overcoming relative-overgeneralisation and the alter-exploration problem while mitigating the moving target problem.

READ FULL TEXT
research
07/14/2017

Lenient Multi-Agent Deep Reinforcement Learning

A significant amount of research in recent years has been dedicated towa...
research
12/06/2020

Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning

The development of deep reinforcement learning (DRL) has benefited from ...
research
12/29/2019

Individual specialization in multi-task environments with multiagent reinforcement learners

There is a growing interest in Multi-Agent Reinforcement Learning (MARL)...
research
11/18/2022

Credit-cognisant reinforcement learning for multi-agent cooperation

Traditional multi-agent reinforcement learning (MARL) algorithms, such a...
research
09/14/2021

DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning

Multi-Agent reinforcement learning has received lot of attention in rece...
research
10/09/2021

Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games

In multi-agent reinforcement learning (MARL), independent learners are t...
research
10/08/2019

Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

Deep Reinforcement Learning (DRL) has shown its promising capabilities t...

Please sign up or login with your details

Forgot password? Click here to reset