Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

05/13/2021
by   Ming Yin, et al.
0

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for finite horizon MDP) and provides a unified view towards optimal learning for several well-motivated offline tasks. Uniform OPE sup_Π|Q^π-Q̂^π|<ϵ (initiated by <cit.>) is a stronger measure than the point-wise (fixed policy) OPE and ensures offline policy learning when Π contains all policies (global policy class). In this paper, we establish an Ω(H^2 S/d_mϵ^2) lower bound (over model-based family) for the global uniform OPE, where d_m is the minimal state-action probability induced by the behavior policy. Next, our main result establishes an episode complexity of Õ(H^2/d_mϵ^2) for local uniform convergence that applies to all near-empirically optimal policies for the MDPs with stationary transition. This result implies the optimal sample complexity for offline learning and separates the local uniform OPE from the global case due to the extra S factor. Paramountly, the model-based method combining with our new analysis technique (singleton absorbing MDP) can be adapted to the new settings: offline task-agnostic and the offline reward-free with optimal complexity Õ(H^2log(K)/d_mϵ^2) (K is the number of tasks) and Õ(H^2S/d_mϵ^2) respectively, which provides a unified framework for simultaneously solving different offline RL problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2020

Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning

The Off-Policy Evaluation aims at estimating the performance of target p...
research
02/02/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

We consider the problem of offline reinforcement learning (RL) – a well-...
research
10/17/2021

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where ...
research
03/25/2021

Nearly Horizon-Free Offline Reinforcement Learning

We revisit offline reinforcement learning on episodic time-homogeneous t...
research
01/06/2021

Learn Dynamic-Aware State Embedding for Transfer Learning

Transfer reinforcement learning aims to improve the sample efficiency of...
research
06/16/2023

π2vec: Policy Representations with Successor Features

This paper describes π2vec, a method for representing behaviors of black...
research
06/14/2023

Off-policy Evaluation in Doubly Inhomogeneous Environments

This work aims to study off-policy evaluation (OPE) under scenarios wher...

Please sign up or login with your details

Forgot password? Click here to reset