Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

05/13/2021
by   Ming Yin, et al.
0

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for finite horizon MDP) and provides a unified view towards optimal learning for several well-motivated offline tasks. Uniform OPE sup_Π|Q^π-Q̂^π|<ϵ (initiated by <cit.>) is a stronger measure than the point-wise (fixed policy) OPE and ensures offline policy learning when Π contains all policies (global policy class). In this paper, we establish an Ω(H^2 S/d_mϵ^2) lower bound (over model-based family) for the global uniform OPE, where d_m is the minimal state-action probability induced by the behavior policy. Next, our main result establishes an episode complexity of Õ(H^2/d_mϵ^2) for local uniform convergence that applies to all near-empirically optimal policies for the MDPs with stationary transition. This result implies the optimal sample complexity for offline learning and separates the local uniform OPE from the global case due to the extra S factor. Paramountly, the model-based method combining with our new analysis technique (singleton absorbing MDP) can be adapted to the new settings: offline task-agnostic and the offline reward-free with optimal complexity Õ(H^2log(K)/d_mϵ^2) (K is the number of tasks) and Õ(H^2S/d_mϵ^2) respectively, which provides a unified framework for simultaneously solving different offline RL problems.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/07/2020

Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning

The Off-Policy Evaluation aims at estimating the performance of target p...
10/17/2021

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where ...
04/11/2022

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which ...
01/06/2021

Learn Dynamic-Aware State Embedding for Transfer Learning

Transfer reinforcement learning aims to improve the sample efficiency of...
03/25/2021

Nearly Horizon-Free Offline Reinforcement Learning

We revisit offline reinforcement learning on episodic time-homogeneous t...
01/07/2022

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a ...
04/28/2021

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Standard dynamics models for continuous control make use of feedforward ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.