Empirical Likelihood for Contextual Bandits

06/07/2019
by   Nikos Karampatziakis, et al.
5

We apply empirical likelihood techniques to contextual bandit policy value estimation, confidence intervals, and learning. We propose a tighter estimator for off-policy evaluation with improved statistical performance over previous proposals. Coupled with this estimator is a confidence interval which also improves over previous proposals. We then harness these to improve learning from contextual bandit data. Each of these is empirically evaluated to show good performance against strong baselines in finite sample regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...
research
02/18/2021

Off-policy Confidence Sequences

We develop confidence bounds that hold uniformly over time for off-polic...
research
06/18/2020

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for t...
research
06/09/2022

Conformal Off-Policy Prediction in Contextual Bandits

Most off-policy evaluation methods for contextual bandits have focused o...
research
05/25/2023

Interval estimation in three-class ROC analysis: a fairly general approach based on the empirical likelihood

The empirical likelihood is a powerful nonparametric tool, that emulates...
research
06/01/2021

Post-Contextual-Bandit Inference

Contextual bandit algorithms are increasingly replacing non-adaptive A/B...
research
02/18/2020

Adaptive Estimator Selection for Off-Policy Evaluation

We develop a generic data-driven method for estimator selection in off-p...

Please sign up or login with your details

Forgot password? Click here to reset