Decentralized Q-Learning in Zero-sum Markov Games

06/04/2021
by   Muhammed O. Sayin, et al.
0

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts their policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2022

Independent and Decentralized Learning in Markov Potential Games

We propose a multi-agent reinforcement learning dynamics, and analyze it...
research
12/15/2021

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...
research
10/08/2020

Fictitious play in zero-sum stochastic games

We present fictitious play dynamics for the general class of stochastic ...
research
10/12/2021

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

This paper addresses the problem of learning an equilibrium efficiently ...
research
05/26/2022

Logit-Q Learning in Markov Games

We present new independent learning dynamics provably converging to an e...
research
04/07/2023

Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity

This paper studies the finite-time horizon Markov games where the agents...
research
02/20/2023

Efficient-Q Learning for Stochastic Games

We present the new efficient-Q learning dynamics for stochastic games be...

Please sign up or login with your details

Forgot password? Click here to reset