Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

07/07/2022
by   Lucas Baudin, et al.
0

Recent extensions to dynamic games of the well-known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the systems in (Leslie 2020; Baudin 2022) to construct a family of new decentralized learning algorithms which are model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our procedures can be seen as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where the players best responses are regularized, thanks to a smooth strictly concave perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in zero-sum and identical-interest discounted stochastic games. The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods. When there is only one player, our problem is an instance of Reinforcement Learning and our procedures are proved to globally converge to the optimal stationary policy of the regularized MDP. In that sense, they can be seen as an alternative to the well known Q-learning procedure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Best-Response Dynamics and Fictitious Play in Identical Interest Stochastic Games

This paper combines ideas from Q-learning and fictitious play to define ...
research
05/23/2022

Fictitious Play in Markov Games with Single Controller

Certain but important classes of strategic-form games, including zero-su...
research
02/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

We study infinite-horizon discounted two-player zero-sum Markov games, a...
research
06/26/2011

Learning to Coordinate Efficiently: A Model-based Approach

In common-interest stochastic games all players receive an identical pay...
research
04/04/2022

On the Global Convergence of Stochastic Fictitious Play in Stochastic Games with Turn-based Controllers

This paper presents a learning dynamic with almost sure convergence guar...
research
04/23/2019

Deep Q-Learning for Nash Equilibria: Nash-DQN

Model-free learning for multi-agent stochastic games is an active area o...
research
03/07/2018

Aspiration-based Perturbed Learning Automata

This paper introduces a novel payoff-based learning scheme for distribut...

Please sign up or login with your details

Forgot password? Click here to reset