
Learning ZeroSum SimultaneousMove Markov Games Using Function Approximation and Correlated Equilibrium
We develop provably efficient reinforcement learning algorithms for two...
read it

Towards General Function Approximation in ZeroSum Markov Games
This paper considers twoplayer zerosum finitehorizon Markov games wit...
read it

NearOptimal Reinforcement Learning with SelfPlay
This paper considers the problem of designing optimal algorithms for rei...
read it

GapDependent Bounds for TwoPlayer Markov Games
As one of the most popular methods in the field of reinforcement learnin...
read it

ZeroSum SemiMarkov Games with StateActionDependent Discount Factors
SemiMarkov model is one of the most general models for stochastic dynam...
read it

ModelFree Learning for TwoPlayer ZeroSum Partially Observable Markov Games with Perfect Recall
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
read it

Performance Analysis of Trial and Error Algorithms
Modelfree decentralized optimizations and learning are receiving increa...
read it
Almost Optimal Algorithms for Twoplayer Markov Games with Linear Function Approximation
We study reinforcement learning for twoplayer zerosum Markov games with simultaneous moves in the finitehorizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm NashUCRLVTR based on the principle "OptimisminFaceofUncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally very efficient. Specifically, we show that NashUCRLVTR can provably achieve an Õ(dH√(T)) regret, where d is the linear function dimension, H is the length of the game and T is the total number of steps in the game. To access the optimality of our algorithm, we also prove an Ω̃( dH√(T)) lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.
READ FULL TEXT
Comments
There are no comments yet.