Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

07/25/2022
by   Shuang Qiu, et al.
3

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight 𝒪(√(K)) regret bounds after K episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is 𝒪(√(K)).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Self-play, where the algorithm learns by playing against itself without ...
research
06/03/2022

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

We study decentralized policy learning in Markov games where we control ...
research
10/03/2022

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Multi-Agent Reinforcement Learning (MARL) – where multiple agents learn ...
research
02/15/2021

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

We study reinforcement learning for two-player zero-sum Markov games wit...
research
09/13/2020

Efficient Competitive Self-Play Policy Optimization

Reinforcement learning from self-play has recently reported many success...
research
10/18/2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

We study episodic reinforcement learning (RL) in non-stationary linear k...
research
05/26/2020

Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

In Markov games, playing against non-stationary opponents with learning ...

Please sign up or login with your details

Forgot password? Click here to reset