Is Pessimism Provably Efficient for Offline RL?

12/30/2020
by   Ying Jin, et al.
0

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the “best effort” among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the “irrelevant” trajectories that are less covered by the dataset and not informative for the optimal policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Recent theoretical work studies sample-efficient reinforcement learning ...
research
02/02/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

We consider the problem of offline reinforcement learning (RL) – a well-...
research
10/17/2021

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

We study the offline reinforcement learning (offline RL) problem, where ...
research
09/18/2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasur...
research
06/14/2023

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources

Existing theoretical studies on offline reinforcement learning (RL) most...
research
12/19/2022

Policy learning "without” overlap: Pessimism and generalized empirical Bernstein's inequality

This paper studies offline policy learning, which aims at utilizing obse...
research
03/22/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...

Please sign up or login with your details

Forgot password? Click here to reset