Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

05/22/2023
by   Yue Wang, et al.
0

Offline reinforcement learning aims to find the optimal policy from a pre-collected dataset without active exploration. This problem is faced with major challenges, such as a limited amount of data and distribution shift. Existing studies employ the principle of pessimism in face of uncertainty, and penalize rewards for less visited state-action pairs. In this paper, we directly model the uncertainty in the transition kernel using an uncertainty set, and then employ the approach of distributionally robust optimization that optimizes the worst-case performance over the uncertainty set. We first design a Hoeffding-style uncertainty set, which guarantees that the true transition kernel lies in the uncertainty set with high probability. We theoretically prove that it achieves an ϵ-accuracy with a sample complexity of 𝒪((1-γ)^-4ϵ^-2SC^π^*), where γ is the discount factor, C^π^* is the single-policy concentrability for any comparator policy π^*, and S is the number of states. We further design a Bernstein-style uncertainty set, which does not necessarily guarantee the true transition kernel lies in the uncertainty set. We show an improved and near-optimal sample complexity of 𝒪((1-γ)^-3ϵ^-2(SC^π^*+(μ_min)^-1) ), where μ_min denotes the minimal non-zero entry of the behavior distribution. In addition, the computational complexity of our algorithms is the same as one of the LCB-based methods in the literature. Our results demonstrate that distributionally robust optimization method can also efficiently solve offline reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

This paper concerns the central issues of model robustness and sample ef...
research
09/15/2023

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Off-policy evaluation and learning are concerned with assessing a given ...
research
09/14/2022

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward su...
research
02/28/2022

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Offline or batch reinforcement learning seeks to learn a near-optimal po...
research
05/09/2021

Non-asymptotic Performances of Robust Markov Decision Processes

In this paper, we study the non-asymptotic performance of optimal policy...
research
11/20/2019

Corruption Robust Exploration in Episodic Reinforcement Learning

We initiate the study of multi-stage episodic reinforcement learning und...
research
09/29/2021

Online Robust Reinforcement Learning with Model Uncertainty

Robust reinforcement learning (RL) is to find a policy that optimizes th...

Please sign up or login with your details

Forgot password? Click here to reset