Asynchronous Decentralized Q-Learning: Two Timescale Analysis By Persistence

08/07/2023
by   Bora Yongacoglu, et al.
0

Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchrony is infeasible in many decentralized applications. In this paper, we study an asynchronous variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the asynchronous algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing the synchrony assumptions of earlier work. Our analysis also applies to asynchronous generalizations of a number of other algorithms from the regret testing tradition, whose performance is analyzed by multi-timescale methods that study Markov chains obtained via policy update dynamics. This work extends the applicability of the decentralized Q-learning algorithm and its relatives to settings in which parameters are selected in an independent manner, and tames non-stationarity without imposing the coordination assumptions of prior work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2023

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized cooperative multi-agent deep reinforcement learning (MARL)...
research
03/16/2023

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Stochastic games are a popular framework for studying multi-agent reinfo...
research
05/29/2022

Independent and Decentralized Learning in Markov Potential Games

We propose a multi-agent reinforcement learning dynamics, and analyze it...
research
06/03/2022

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

We study decentralized policy learning in Markov games where we control ...
research
04/10/2017

Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

In reinforcement learning, agents learn by performing actions and observ...
research
05/31/2022

Decentralized Competing Bandits in Non-Stationary Matching Markets

Understanding complex dynamics of two-sided online matching markets, whe...
research
09/10/2020

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Using logical clauses to represent patterns, Tsetlin machines (TMs) have...

Please sign up or login with your details

Forgot password? Click here to reset