Log In Sign Up

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

by   Zixiang Chen, et al.

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally very efficient. Specifically, we show that Nash-UCRL-VTR can provably achieve an Õ(dH√(T)) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game. To access the optimality of our algorithm, we also prove an Ω̃( dH√(T)) lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.


page 1

page 2

page 3

page 4


Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

We develop provably efficient reinforcement learning algorithms for two-...

Towards General Function Approximation in Zero-Sum Markov Games

This paper considers two-player zero-sum finite-horizon Markov games wit...

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

We consider learning Nash equilibria in two-player zero-sum Markov Games...

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

While single-agent policy optimization in a fixed environment has attrac...

Empirical Policy Optimization for n-Player Markov Games

In single-agent Markov decision processes, an agent can optimize its pol...

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

We study the problem of learning a Nash equilibrium (NE) in an imperfect...