A Sharp Characterization of Linear Estimators for Offline Policy Evaluation

03/08/2022
by   Juan C. Perdomo, et al.
0

Offline policy evaluation is a fundamental statistical problem in reinforcement learning that involves estimating the value function of some decision-making policy given data collected by a potentially different policy. In order to tackle problems with complex, high-dimensional observations, there has been significant interest from theoreticians and practitioners alike in understanding the possibility of function approximation in reinforcement learning. Despite significant study, a sharp characterization of when we might expect offline policy evaluation to be tractable, even in the simplest setting of linear function approximation, has so far remained elusive, with a surprising number of strong negative results recently appearing in the literature. In this work, we identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods, in particular Fitted Q-iteration (FQI) and least squares temporal difference learning (LSTD), to succeed at offline policy evaluation. Using this characterization, we establish a precise hierarchy of regimes under which these estimators succeed. We prove that LSTD works under strictly weaker conditions than FQI. Furthermore, we establish that if a problem is not solvable via LSTD, then it cannot be solved by a broad class of linear estimators, even in the limit of infinite data. Taken together, our results provide a complete picture of the behavior of linear estimators for offline policy evaluation (OPE), unify previously disparate analyses of canonical algorithms, and provide significantly sharper notions of the underlying statistical complexity of OPE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Offline reinforcement learning seeks to utilize offline (observational) ...
research
03/17/2021

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

In this paper, we investigate the sample complexity of policy evaluation...
research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
06/24/2023

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Developing theoretical guarantees on the sample complexity of offline RL...
research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
10/03/2022

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Offline reinforcement learning, which aims at optimizing sequential deci...
research
03/24/2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning

We introduce a new reinforcement learning principle that approximates th...

Please sign up or login with your details

Forgot password? Click here to reset