Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

02/05/2021
by   Masatoshi Uehara, et al.
2

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and q-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions, characterized by the critical inequality <cit.>. Based on this result, we analyze convergence rates for OPE. In particular, we introduce novel alternative completeness conditions under which OPE is feasible and we present the first finite-sample result with first-order efficiency in non-tabular environments, i.e., having the minimal coefficient in the leading term.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation

Though the convergence of major reinforcement learning algorithms has be...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...
research
04/04/2017

Finite Sample Analyses for TD(0) with Function Approximation

TD(0) is one of the most commonly used algorithms in reinforcement learn...
research
03/25/2021

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

We study the estimation of causal parameters when not all confounders ar...
research
10/28/2019

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

We provide theoretical investigations into off-policy evaluation in rein...
research
03/01/2023

Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation

Nash Q-learning may be considered one of the first and most known algori...
research
01/15/2015

The Fast Convergence of Incremental PCA

We consider a situation in which we see samples in R^d drawn i.i.d. from...

Please sign up or login with your details

Forgot password? Click here to reset