Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

06/09/2023
by   Donghwan Lee, et al.
0

The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2021

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

We introduce a contractive abstract dynamic programming framework and re...
research
06/16/2019

Solution of Two-Player Zero-Sum Game by Successive Relaxation

We consider the problem of two-player zero-sum game. In this setting, th...
research
03/03/2023

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

We study two-player zero-sum stochastic games, and propose a form of ind...
research
06/08/2022

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

This paper makes progress towards learning Nash equilibria in two-player...
research
07/25/2022

Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View

Q-learning has long been one of the most popular reinforcement learning ...
research
03/17/2023

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Many model-based reinforcement learning (RL) algorithms can be viewed as...
research
07/11/2019

Minimax Theorems for Finite Blocklength Lossy Joint Source-Channel Coding over an AVC

Motivated by applications in the security of cyber-physical systems, we ...

Please sign up or login with your details

Forgot password? Click here to reset