Decentralized model-free reinforcement learning in stochastic games with average-reward objective

01/13/2023
by   Romain Cravic, et al.
0

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent controls only one player and tries to achieve low regret performances against an arbitrary opponent. This contrasts with centralized learning where the agent tries to approximate the Nash equilibrium by controlling both players. In our infinite-horizon undiscounted setting, additional structure assumptions is needed to provide good behaviors of learning processes : here we assume for every strategy of the opponent, the agent has a way to go from any state to any other. This assumption is the analogous to the "communicating" assumption in the MDP setting. We show that our Decentralized Optimistic Nash Q-Learning (DONQ-learning) algorithm achieves both sublinear high probability regret of order T^3/4 and sublinear expected regret of order T^2/3. Moreover, our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works of (Wei et al. 2017) and (Jafarnia-Jahromi et al. 2021) in the same setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
research
06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...
research
03/08/2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Regret minimization has proved to be a versatile tool for tree-form sequ...
research
06/22/2023

Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback

This paper considers a variant of zero-sum matrix games where at each ti...
research
07/30/2021

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games wit...
research
12/07/2019

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Many popular reinforcement learning problems (e.g., navigation in a maze...
research
09/12/2021

Concave Utility Reinforcement Learning with Zero-Constraint Violations

We consider the problem of tabular infinite horizon concave utility rein...

Please sign up or login with your details

Forgot password? Click here to reset