
Best Policy Identification in discounted MDPs: Problemspecific Sample Complexity
We investigate the problem of bestpolicy identification in discounted M...
read it

Planning in Markov Decision Processes with GapDependent Sample Complexity
We propose MDPGapE, a new trajectorybased MonteCarlo Tree Search algo...
read it

Sample Complexity and Overparameterization Bounds for ProjectionFree Neural TD Learning
We study the dynamics of temporaldifference learning with neural networ...
read it

Sequential Transfer in Reinforcement Learning with a Generative Model
We are interested in how to design reinforcement learning agents that pr...
read it

Active Learning for Contextual Search with Binary Feedbacks
In this paper, we study the learning problem in contextual search, which...
read it

Sample Complexity of Asynchronous QLearning: Sharper Analysis and Variance Reduction
Asynchronous Qlearning aims to learn the optimal actionvalue function ...
read it

Agnostic Qlearning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity
The current paper studies the problem of agnostic Qlearning with functi...
read it
On the Sample Complexity of Reinforcement Learning with a Generative Model
We consider the problem of learning the optimal actionvalue function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of modelbased value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N stateaction pairs and the discount factor γ∈[0,1) only O(N(N/δ)/((1γ)^3ϵ^2)) samples are required to find an ϵoptimal estimation of the actionvalue function with the probability 1δ. We also prove a matching lower bound of Θ (N(N/δ)/((1γ)^3ϵ^2)) on the sample complexity of estimating the optimal actionvalue function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action) value function in which the upper bound matches the lower bound of RL in terms of N, ϵ, δ and 1/(1γ). Also, both our lower bound and our upper bound significantly improve on the stateoftheart in terms of 1/(1γ).
READ FULL TEXT
Comments
There are no comments yet.