Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

02/06/2023
by   Hadi Nekoei, et al.
0

Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the key challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independent of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to concurrent updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn concurrently, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl (papoudakis2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

READ FULL TEXT

page 4

page 6

page 7

page 12

research
02/02/2023

Best Possible Q-Learning

Fully decentralized learning, where the global information, i.e., the ac...
research
11/18/2019

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Existing value-factorized based Multi-Agent deep Reinforce-ment Learning...
research
08/07/2023

Asynchronous Decentralized Q-Learning: Two Timescale Analysis By Persistence

Non-stationarity is a fundamental challenge in multi-agent reinforcement...
research
08/16/2019

Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning

Multi-agent systems have a wide range of applications in cooperative and...
research
12/15/2018

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Recent successes of value-based multi-agent deep reinforcement learning ...
research
09/13/2017

Learning with Opponent-Learning Awareness

Multi-agent settings are quickly gathering importance in machine learnin...
research
07/20/2017

Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach

Learning cooperative policies for multi-agent systems is often challenge...

Please sign up or login with your details

Forgot password? Click here to reset