On the Sample Complexity of Reinforcement Learning with a Generative Model

06/27/2012
by   Mohammad Gheshlaghi Azar, et al.
0

We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor γ∈[0,1) only O(N(N/δ)/((1-γ)^3ϵ^2)) samples are required to find an ϵ-optimal estimation of the action-value function with the probability 1-δ. We also prove a matching lower bound of Θ (N(N/δ)/((1-γ)^3ϵ^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, ϵ, δ and 1/(1-γ). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1-γ).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2020

Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity

We investigate the problem of best-policy identification in discounted M...
research
06/10/2020

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algo...
research
03/15/2023

On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Model-based Reinforcement Learning (RL) integrates learning and planning...
research
03/02/2021

Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning

We study the dynamics of temporal-difference learning with neural networ...
research
07/01/2020

Sequential Transfer in Reinforcement Learning with a Generative Model

We are interested in how to design reinforcement learning agents that pr...
research
06/04/2020

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Asynchronous Q-learning aims to learn the optimal action-value function ...
research
11/14/2022

Linear Reinforcement Learning with Ball Structure Action Space

We study the problem of Reinforcement Learning (RL) with linear function...

Please sign up or login with your details

Forgot password? Click here to reset