Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

02/21/2020
by   Yaqi Duan, et al.
1

This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavioral policies. We study a regression-based fitted Q iteration method, and show that it is equivalent to a model-based method that estimates a conditional mean embedding of the transition operator. We prove that this method is information-theoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finite-sample instance-dependent error upper bound and a nearly-matching minimax lower bound. The policy evaluation error depends sharply on a restricted χ^2-divergence over the function class between the long-term distribution of the target policy and the distribution of past data. This restricted χ^2-divergence is both instance-dependent and function-class-dependent. It characterizes the statistical limit of off-policy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2020

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Rei...
research
02/10/2022

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinfor...
research
03/11/2021

On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

This paper studies the statistical theory of offline reinforcement learn...
research
06/06/2022

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

We consider the off-policy evaluation problem of reinforcement learning ...
research
02/06/2021

Bootstrapping Statistical Inference for Off-Policy Evaluation

Bootstrapping provides a flexible and effective approach for assessing t...
research
05/30/2023

Sharp high-probability sample complexities for policy evaluation with linear function approximation

This paper is concerned with the problem of policy evaluation with linea...
research
12/09/2020

Optimal oracle inequalities for solving projected fixed-point equations

Linear fixed point equations in Hilbert spaces arise in a variety of set...

Please sign up or login with your details

Forgot password? Click here to reset