Doubly robust off-policy evaluation with shrinkage

07/22/2019
by   Yi Su, et al.
0

We design a new family of estimators for off-policy evaluation in contextual bandits. Our estimators are based on the asymptotically optimal approach of doubly robust estimation, but they shrink importance weights to obtain a better bias-variance tradeoff in finite samples. Our approach adapts importance weights to the quality of a reward predictor, interpolating between doubly robust estimation and direct modeling. When the reward predictor is poor, we recover previously studied weight clipping, but when the reward predictor is good, we obtain a new form of shrinkage. To navigate between these regimes and tune the shrinkage coefficient, we design a model selection procedure, which we prove is never worse than the doubly robust estimator. Extensive experiments on bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

We study Off-Policy Evaluation (OPE) in contextual bandit settings with ...
research
06/03/2021

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

It has become increasingly common for data to be collected adaptively, f...
research
02/23/2023

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

In this work, we consider the off-policy policy evaluation problem for c...
research
05/14/2023

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for l...
research
07/31/2021

Debiasing Samples from Online Learning Using Bootstrap

It has been recently shown in the literature that the sample averages fr...
research
10/16/2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

We present and prove properties of a new offline policy evaluator for an...
research
11/06/2018

Optimal singular value shrinkage with noise homogenization

We derive the optimal singular values for prediction in the spiked model...

Please sign up or login with your details

Forgot password? Click here to reset