Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

06/18/2020
by   Ilja Kuzborskij, et al.
0

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies. We propose a new method to compute a lower bound on the value of an arbitrary target policy given some logged data in contextual bandits for a desired coverage. The lower bound is built around the so-called Self-normalized Importance Weighting (SN) estimator. It combines the use of a semi-empirical Efron-Stein tail inequality to control the concentration and Harris' inequality to control the bias. The new approach is evaluated on a number of synthetic and real datasets and is found to be superior to its main competitors, both in terms of tightness of the confidence intervals and the quality of the policies chosen.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2019

Empirical Likelihood for Contextual Bandits

We apply empirical likelihood techniques to contextual bandit policy val...
research
06/03/2021

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

It has become increasingly common for data to be collected adaptively, f...
research
10/16/2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

We present and prove properties of a new offline policy evaluator for an...
research
12/12/2020

Offline Policy Selection under Uncertainty

The presence of uncertainty in policy evaluation significantly complicat...
research
08/06/2019

Policy Evaluation with Latent Confounders via Optimal Balance

Evaluating novel contextual bandit policies using logged data is crucial...
research
02/26/2023

Kernel Conditional Moment Constraints for Confounding Robust Inference

We study policy evaluation of offline contextual bandits subject to unob...
research
05/19/2023

Off-policy evaluation beyond overlap: partial identification through smoothness

Off-policy evaluation (OPE) is the problem of estimating the value of a ...

Please sign up or login with your details

Forgot password? Click here to reset