Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

02/12/2021
by   Gen Li, et al.
4

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. Take a γ-discounted infinite-horizon MDP with state space 𝒮 and action space 𝒜: to yield an entrywise ε-accurate estimate of the optimal Q-function, state-of-the-art theory for Q-learning proves that a sample size on the order of |𝒮||𝒜|/(1-γ)^5ε^2 is sufficient, which, however, fails to match with the existing minimax lower bound. This gives rise to natural questions: what is the sharp sample complexity of Q-learning? Is Q-learning provably sub-optimal? In this work, we settle these questions by (1) demonstrating that the sample complexity of Q-learning is at most on the order of |𝒮||𝒜|/(1-γ)^4ε^2 (up to some log factor) for any 0<ε <1, and (2) developing a matching lower bound to confirm the sharpness of our result. Our findings unveil both the effectiveness and limitation of Q-learning: its sample complexity matches that of speedy Q-learning without requiring extra computation and storage, albeit still being considerably higher than the minimax lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

The curse of dimensionality is a widely known issue in reinforcement lea...
research
06/04/2020

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Asynchronous Q-learning aims to learn the optimal action-value function ...
research
06/06/2020

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

In this paper we consider the problem of learning an ϵ-optimal policy fo...
research
05/26/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

We investigate the sample efficiency of reinforcement learning in a γ-di...
research
07/01/2020

Sequential Transfer in Reinforcement Learning with a Generative Model

We are interested in how to design reinforcement learning agents that pr...
research
06/11/2019

Variance-reduced Q-learning is minimax optimal

We introduce and analyze a form of variance-reduced Q-learning. For γ-di...
research
07/25/2023

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset