Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

02/15/2021
by   Zixiang Chen, et al.
4

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally very efficient. Specifically, we show that Nash-UCRL-VTR can provably achieve an Õ(dH√(T)) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game. To access the optimality of our algorithm, we also prove an Ω̃( dH√(T)) lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

We develop provably efficient reinforcement learning algorithms for two-...
research
08/10/2022

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

We consider learning Nash equilibria in two-player zero-sum Markov Games...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...
research
07/30/2021

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games wit...
research
06/19/2023

Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

The congestion game is a powerful model that encompasses a range of engi...
research
07/25/2022

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

While single-agent policy optimization in a fixed environment has attrac...
research
06/22/2020

Near-Optimal Reinforcement Learning with Self-Play

This paper considers the problem of designing optimal algorithms for rei...

Please sign up or login with your details

Forgot password? Click here to reset