Triply Robust Off-Policy Evaluation

11/13/2019
by   Anqi Liu, et al.
37

We propose a robust regression approach to off-policy evaluation (OPE) for contextual bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression tools. Ours is a general approach that can be used to augment any existing OPE method that utilizes the direct method. When augmenting doubly robust methods, we call the resulting method Triply Robust. We prove upper bounds on the resulting bias and variance, as well as derive novel minimax bounds based on robust minimax analysis for covariate shift. Our robust regression method is compatible with deep learning, and is thus applicable to complex OPE settings that require powerful function approximators. Finally, we demonstrate superior empirical performance across the standard OPE benchmarks, especially in the case where the logging policy is unknown and must be estimated from data.

READ FULL TEXT
research
02/26/2020

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

We consider the evaluation and training of a new policy for the evaluati...
research
05/06/2022

Optimally tackling covariate shift in RKHS-based nonparametric regression

We study the covariate shift problem in the context of nonparametric reg...
research
12/28/2017

Robust Covariate Shift Prediction with General Losses and Feature Views

Covariate shift relaxes the widely-employed independent and identically ...
research
02/10/2022

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinfor...
research
06/10/2020

Distributional Robust Batch Contextual Bandits

Policy learning using historical observational data is an important prob...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...
research
10/16/2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

We present and prove properties of a new offline policy evaluator for an...

Please sign up or login with your details

Forgot password? Click here to reset