On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

03/07/2023
by   Mustafa O. Karabag, et al.
0

Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which ...
research
12/31/2021

Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning

We hypothesize that empirically studying the sample complexity of offlin...
research
07/13/2021

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...
research
06/24/2023

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Developing theoretical guarantees on the sample complexity of offline RL...
research
07/07/2020

Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning

The Off-Policy Evaluation aims at estimating the performance of target p...
research
05/25/2023

Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

This paper considers a class of reinforcement learning problems, which i...
research
11/14/2022

Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency

Controlled Markov chains (CMCs) form the bedrock for model-based reinfor...

Please sign up or login with your details

Forgot password? Click here to reset