Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

03/01/2023
by   Pedro Cisneros-Velarde, et al.
0

Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation – a representation regime introduced when the state space is large or continuous – and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2021

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Learning in stochastic games is arguably the most standard and fundament...
research
02/06/2019

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Though the convergence of major reinforcement learning algorithms has be...
research
06/07/2021

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Modern reinforcement learning (RL) commonly engages practical problems w...
research
11/29/2020

Minimax Sample Complexity for Turn-based Stochastic Game

The empirical success of Multi-agent reinforcement learning is encouragi...
research
12/06/2018

Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning

Despite the increasing interest in multi-agent reinforcement learning (M...
research
02/02/2021

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

This paper develops an unified framework to study finite-sample converge...
research
02/05/2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

We offer a theoretical characterization of off-policy evaluation (OPE) i...

Please sign up or login with your details

Forgot password? Click here to reset