Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

04/11/2022
by   Gen Li, et al.
3

This paper is concerned with offline reinforcement learning (RL), which learns using pre-collected data without further exploration. Effective offline RL would be able to accommodate distribution shift and limited data coverage. However, prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality, thus posing an impediment to efficient offline RL in sample-starved applications. We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost for tabular Markov decision processes (MDPs). Concretely, consider a finite-horizon (resp. γ-discounted infinite-horizon) MDP with S states and horizon H (resp. effective horizon 1/1-γ), and suppose the distribution shift of data is reflected by some single-policy clipped concentrability coefficient C^⋆_clipped. We prove that model-based offline RL yields ε-accuracy with a sample complexity of H^4SC_clipped^⋆/ε^2 (finite-horizon MDPs) SC_clipped^⋆/(1-γ)^3ε^2 (infinite-horizon MDPs) up to log factor, which is minimax optimal for the entire ε-range. Our algorithms are "pessimistic" variants of value iteration with Bernstein-style penalties, and do not require sophisticated variance reduction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

We consider the problem of offline reinforcement learning (RL) – a well-...
research
03/07/2023

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Offline reinforcement learning (offline RL) considers problems where lea...
research
05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
research
04/19/2023

Bridging RL Theory and Practice with the Effective Horizon

Deep reinforcement learning (RL) works impressively in some environments...
research
07/25/2023

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL)...
research
04/14/2023

Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

This paper studies reward-agnostic exploration in reinforcement learning...
research
05/15/2018

Feedback-Based Tree Search for Reinforcement Learning

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a numb...

Please sign up or login with your details

Forgot password? Click here to reset