Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

02/07/2023
by   Hanlin Zhu, et al.
0

Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills which aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we propose a novel provably efficient algorithm (the sample complexity is Õ( poly(1/ϵ)) where ϵ is the desired suboptimality of the learned policy) with general function approximation. Our algorithm only requires nearly minimal assumptions of the dataset (single-policy concentrability) and the function class (realizability). Moreover, our algorithm consists of two uninterleaved optimization steps, which we refer to as V-learning and policy learning, and is computationally stable since it does not involve minimax optimization. To the best of our knowledge, this is the first algorithm with general function approximation and single-policy concentrability that is both statistically efficient and computationally stable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

Offline goal-conditioned reinforcement learning (GCRL) promises general-...
research
02/17/2023

Swapped goal-conditioned offline reinforcement learning

Offline goal-conditioned reinforcement learning (GCRL) can be challengin...
research
11/21/2021

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

We consider the offline reinforcement learning problem, where the aim is...
research
02/22/2023

Provably Efficient Reinforcement Learning via Surprise Bound

Value function approximation is important in modern reinforcement learni...
research
10/03/2022

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Offline reinforcement learning, which aims at optimizing sequential deci...
research
11/01/2022

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Offline reinforcement learning (RL), which refers to decision-making fro...
research
08/23/2022

Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments

We study offline reinforcement learning under a novel model called strat...

Please sign up or login with your details

Forgot password? Click here to reset