Near-Optimal Reinforcement Learning with Self-Play

06/22/2020
∙
by   Yu Bai, et al.
∙
0
∙

This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with S states, A max-player actions and B min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires 𝒊Ėƒ(S^2AB) steps of game playing, when only highlighting the dependency on (S,A,B). In contrast, the best existing lower bound scales as ÎĐ(S(A+B)) and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the Nash Q-learning algorithm with sample complexity 𝒊Ėƒ(SAB), and a new Nash V-learning algorithm with sample complexity 𝒊Ėƒ(S(A+B)). The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. Towards understanding learning objectives in Markov games other than finding the Nash equilibrium, we present a computational hardness result for learning the best responses against a fixed opponent. This also implies the computational hardness for achieving sublinear regret when playing against adversarial opponents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/03/2022

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

This paper resolves the open question of designing near-optimal algorith...
research
∙ 06/08/2022

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

This paper makes progress towards learning Nash equilibria in two-player...
research
∙ 03/14/2022

Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

An ideal strategy in zero-sum games should not only grant the player an ...
research
∙ 02/27/2020

Tree Polymatrix Games are PPAD-hard

We prove that it is PPAD-hard to compute a Nash equilibrium in a tree po...
research
∙ 02/15/2021

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

We study reinforcement learning for two-player zero-sum Markov games wit...
research
∙ 03/19/2023

Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

We study the sample complexity of identifying an approximate equilibrium...
research
∙ 05/12/2021

Identity Concealment Games: How I Learned to Stop Revealing and Love the Coincidences

In an adversarial environment, a hostile player performing a task may be...

Please sign up or login with your details

Forgot password? Click here to reset