Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

10/03/2022
by   Ming Yin, et al.
0

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking. Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA). This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures. Most importantly, we show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
12/31/2021

Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning

We hypothesize that empirically studying the sample complexity of offlin...
research
10/22/2020

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) ...
research
02/07/2023

Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

Goal-conditioned reinforcement learning (GCRL) refers to learning genera...
research
03/08/2022

A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinfo...
research
07/14/2021

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Deep Reinforcement Learning (RL) powered by neural net approximation of ...
research
05/24/2023

Replicable Reinforcement Learning

The replicability crisis in the social, behavioral, and data sciences ha...

Please sign up or login with your details

Forgot password? Click here to reset