Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

01/27/2019
by   Kefan Dong, et al.
0

A fundamental question in reinforcement learning is whether model-free algorithms are sample efficient. Recently, Jin et al. jin2018q proposed a Q-learning algorithm with UCB exploration policy, and proved it has nearly optimal regret bound for finite-horizon episodic MDP. In this paper, we adapt Q-learning with UCB-exploration bonus to infinite-horizon MDP with discounted rewards without accessing a generative model. We show that the sample complexity of exploration of our algorithm is bounded by Õ(SA/ϵ^2(1-γ)^7). This improves the previously best known result of Õ(SA/ϵ^4(1-γ)^8) in this setting achieved by delayed Q-learning strehl2006pac, and matches the lower bound in terms of ϵ as well as S and A except for logarithmic factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2020

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

In this paper we consider the problem of learning an ϵ-optimal policy fo...
research
11/02/2020

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

Recently, Wang et al. (2020) showed a highly intriguing hardness result ...
research
10/08/2019

Receding Horizon Curiosity

Sample-efficient exploration is crucial not only for discovering rewardi...
research
10/27/2020

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning complex behaviors through interaction requires coordinated long...
research
09/22/2020

Is Q-Learning Provably Efficient? An Extended Analysis

This work extends the analysis of the theoretical results presented with...
research
02/19/2021

Randomized Exploration is Near-Optimal for Tabular MDP

We study exploration using randomized value functions in Thompson Sampli...
research
05/27/2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

In this work, we consider and analyze the sample complexity of model-fre...

Please sign up or login with your details

Forgot password? Click here to reset