What are the Statistical Limits of Offline RL with Linear Function Approximation?

10/22/2020
by   Ruosong Wang, et al.
0

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of every policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), then any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon in order to non-trivially estimate the value of any given policy. Our results highlight that sample-efficient offline policy evaluation is simply not possible unless significantly stronger conditions hold; such conditions include either having low distribution shift (where the offline data distribution is close to the distribution of the policy to be evaluated) or significantly stronger representational conditions (beyond realizability).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2021

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

We consider the offline reinforcement learning problem, where the aim is...
research
03/17/2021

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

In this paper, we investigate the sample complexity of policy evaluation...
research
03/08/2022

A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

Offline policy evaluation is a fundamental statistical problem in reinfo...
research
03/08/2021

Instabilities of Offline RL with Pre-Trained Neural Representation

In offline reinforcement learning (RL), we seek to utilize offline data ...
research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
10/09/2022

The Role of Coverage in Online Reinforcement Learning

Coverage conditions – which assert that the data logging distribution ad...
research
10/03/2022

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Offline reinforcement learning, which aims at optimizing sequential deci...

Please sign up or login with your details

Forgot password? Click here to reset