Nearly Horizon-Free Offline Reinforcement Learning

03/25/2021
by   Tongzheng Ren, et al.
0

We revisit offline reinforcement learning on episodic time-homogeneous tabular Markov Decision Processes with S states, A actions and planning horizon H. Given the collected N episodes data with minimum cumulative reaching probability d_m, we obtain the first set of nearly H-free sample complexity bounds for evaluation and planning using the empirical MDPs: 1.For the offline evaluation, we obtain an Õ(√(1/Nd_m)) error rate, which matches the lower bound and does not have additional dependency on (S,A) in higher-order term, that is different from previous works <cit.>. 2.For the offline policy optimization, we obtain an Õ(√(1/Nd_m) + S/Nd_m) error rate, improving upon the best known result by <cit.>, which has additional H and S factors in the main term. Furthermore, this bound approaches the Ω(√(1/Nd_m)) lower bound up to logarithmic factors and a high-order term. To the best of our knowledge, these are the first set of nearly horizon-free bounds in offline reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

We consider the problem of offline reinforcement learning (RL) – a well-...
research
11/24/2020

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Multi-goal reaching is an important problem in reinforcement learning ne...
research
10/17/2021

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where ...
research
10/29/2015

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Recently, there has been significant progress in understanding reinforce...
research
05/13/2021

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

This work studies the statistical limits of uniform convergence for offl...
research
03/24/2022

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies

This paper gives the first polynomial-time algorithm for tabular Markov ...
research
07/25/2023

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset