Provable Self-Play Algorithms for Competitive Reinforcement Learning

02/10/2020
by   Yu Bai, et al.
0

Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of exisiting theory in reinforcement learning only applies to the setting where the agent plays against a fixed environment. It remains largely open whether self-play algorithms can be provably effective, especially when it is necessary to manage the exploration/exploitation tradeoff. We study self-play in competitive reinforcement learning under the setting of Markov games, a generalization of Markov decision processes to the two-player case. We introduce a self-play algorithm—Value Iteration with Upper/Lower Confidence Bound (VI-ULCB), and show that it achieves regret Õ(√(T)) after playing T steps of the game. The regret is measured by the agent's performance against a fully adversarial opponent who can exploit the agent's strategy at any step. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret of Õ(T^2/3), but is guaranteed to run in polynomial time even in the worst case. To the best of our knowledge, our work presents the first line of provably sample-efficient self-play algorithms for competitive reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

While quantum reinforcement learning (RL) has attracted a surge of atten...
research
07/25/2022

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

While single-agent policy optimization in a fixed environment has attrac...
research
05/19/2023

Learning Diverse Risk Preferences in Population-based Self-play

Among the great successes of Reinforcement Learning (RL), self-play algo...
research
05/28/2018

Memory Augmented Self-Play

Self-play is an unsupervised training procedure which enables the reinfo...
research
06/02/2018

Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting

An almost-perfect chess playing agent has been a long standing challenge...
research
01/06/2023

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Real-world reinforcement learning (RL) is often severely limited since t...
research
02/23/2023

Targeted Search Control in AlphaZero for Effective Policy Improvement

AlphaZero is a self-play reinforcement learning algorithm that achieves ...

Please sign up or login with your details

Forgot password? Click here to reset