Off-policy Confidence Sequences

02/18/2021
by   Nikos Karampatziakis, et al.
0

We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the "gated deployment" problem of safely upgrading a production contextual bandit system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2019

Empirical Likelihood for Contextual Bandits

We apply empirical likelihood techniques to contextual bandit policy val...
research
10/19/2022

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are ubiquitous tools for active sequential ...
research
01/23/2023

Huber-Robust Confidence Sequences

Confidence sequences are confidence intervals that can be sequentially t...
research
08/05/2022

Catoni-style Confidence Sequences under Infinite Variance

In this paper, we provide an extension of confidence sequences for setti...
research
10/18/2018

Uniform, nonparametric, non-asymptotic confidence sequences

A confidence sequence is a sequence of confidence intervals that is unif...
research
06/24/2019

Sequential estimation of quantiles with applications to A/B-testing and best-arm identification

Consider the problem of sequentially estimating quantiles of any distrib...
research
12/29/2022

Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations

Sequential testing, always-valid p-values, and confidence sequences prom...

Please sign up or login with your details

Forgot password? Click here to reset