A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

03/03/2023
by   Zaiwei Chen, et al.
0

We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known 𝒪̃(1/ϵ^2) sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper 𝒪̃(1/ϵ) sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Though the convergence of major reinforcement learning algorithms has be...
research
03/19/2023

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

We study the sample complexity of identifying an approximate equilibrium...
research
10/05/2021

Robustness and sample complexity of model-based MARL for general-sum Markov games

Multi-agent reinfocement learning (MARL) is often modeled using the fram...
research
06/09/2023

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

The objective of this paper is to investigate the finite-time analysis o...
research
05/28/2021

Discretization Drift in Two-Player Games

Gradient-based methods for two-player games produce rich dynamics that c...
research
12/12/2021

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

We analyze the convergence properties of the two-timescale fictitious pl...
research
03/24/2022

Learning the Dynamics of Autonomous Linear Systems From Multiple Trajectories

We consider the problem of learning the dynamics of autonomous linear sy...

Please sign up or login with your details

Forgot password? Click here to reset