On Gap-dependent Bounds for Offline Reinforcement Learning

06/01/2022
by   Xinqi Wang, et al.
0

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy coverage assumption), then the agent can achieve an O(1/ϵ^2) rate, which is also minimax optimal. We show under the optimal policy coverage assumption, the rate can be improved to O(1/ϵ) when there is a positive sub-optimality gap in the optimal Q-function. Furthermore, we show when the visitation probabilities of the behavior policy are uniformly lower bounded for states where an optimal policy's visitation probabilities are positive (the uniform optimal policy coverage assumption), the sample complexity of identifying an optimal policy is independent of 1/ϵ. Lastly, we present nearly-matching lower bounds to complement our gap-dependent upper bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2022

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear functio...
research
02/24/2020

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

It has been a trend in the Reinforcement Learning literature to derive s...
research
12/19/2022

Policy learning "without” overlap: Pessimism and generalized empirical Bernstein's inequality

This paper studies offline policy learning, which aims at utilizing obse...
research
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
research
10/07/2019

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Modern deep learning methods provide an effective means to learn good re...
research
08/05/2021

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental prob...
research
04/20/2002

Learning from Scarce Experience

Searching the space of policies directly for the optimal policy has been...

Please sign up or login with your details

Forgot password? Click here to reset