Learning Zero-sum Stochastic Games with Posterior Sampling

09/08/2021
by   Mehdi Jafarnia-Jahromi, et al.
0

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of O(HS√(AT)) in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here H is an upper bound on the span of the bias function, S is the number of states, A is the number of joint actions and T is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. This improves the best existing regret bound of O(√(DS^2AT^2)) by Wei et. al., 2017 under the same assumption and matches the theoretical lower bound in A and T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret perfo...
research
07/01/2016

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...
research
10/28/2020

Provably Efficient Online Agnostic Learning in Markov Games

We study online agnostic learning, a problem that arises in episodic mul...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...
research
05/11/2019

Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes

We show for the first time, to our knowledge, that it is possible to rec...
research
08/19/2021

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

We revisit the Thompson sampling algorithm to control an unknown linear ...
research
09/12/2021

Concave Utility Reinforcement Learning with Zero-Constraint Violations

We consider the problem of tabular infinite horizon concave utility rein...

Please sign up or login with your details

Forgot password? Click here to reset