Inference for Batched Bandits

02/08/2020
by   Kelly W. Zhang, et al.
14

As bandit algorithms are increasingly utilized in scientific studies, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference regarding the treatment effect on data collected in batches using a bandit algorithm. We focus on the setting in which the total number of batches is fixed and develop approximate inference methods based on the asymptotic distribution as the size of the batches goes to infinity. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when the treatment effect is zero. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is asymptotically normal—even in the zero treatment effect case—on data collected from both multi-arm and contextual bandits. Moreover, BOLS is robust to changes in the baseline reward and can be used for obtaining simultaneous confidence intervals for the treatment effect from all batches in non-stationary bandits. We demonstrate in simulations that BOLS can be used reliably for hypothesis testing and obtaining a confidence interval for the treatment effect, even in small sample settings.

READ FULL TEXT

page 4

page 5

research
04/29/2021

Statistical Inference with M-Estimators on Bandit Data

Bandit algorithms are increasingly used in real world sequential decisio...
research
06/01/2021

Post-Contextual-Bandit Inference

Contextual bandit algorithms are increasingly replacing non-adaptive A/B...
research
03/05/2023

Semi-parametric inference based on adaptively collected data

Many standard estimators, when applied to adaptively collected data, fai...
research
04/09/2023

Asymptotic expansion for batched bandits

In bandit algorithms, the randomly time-varying adaptive experimental de...
research
02/27/2023

Design-Based Inference for Multi-arm Bandits

Multi-arm bandits are gaining popularity as they enable real-world seque...
research
11/07/2019

Confidence Intervals for Policy Evaluation in Adaptive Experiments

Adaptive experiments can result in considerable cost savings in multi-ar...
research
02/17/2023

Post-Episodic Reinforcement Learning Inference

We consider estimation and inference with data collected from episodic r...

Please sign up or login with your details

Forgot password? Click here to reset