Adaptive Experimentation at Scale: Bayesian Algorithms for Flexible Batches

03/21/2023
by   Ethan Che, et al.
0

Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a new adaptive experimentation framework that can flexibly handle any batch size. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of scalable adaptive designs. By deriving an asymptotic sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. We propose a simple iterative planning method, Residual Horizon Optimization, which selects sampling allocations by optimizing a planning objective with stochastic gradient descent. Our method significantly improves statistical power over standard adaptive policies, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings which are difficult for standard adaptive policies, including problems with a small number of reallocation epochs, low signal-to-noise ratio, and unknown reward distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2019

Thompson Sampling with Information Relaxation Penalties

We consider a finite time horizon multi-armed bandit (MAB) problem in a ...
research
12/30/2022

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

With the fast development of big data, it has been easier than before to...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...
research
02/14/2022

Statistical Inference After Adaptive Sampling in Non-Markovian Environments

There is a great desire to use adaptive sampling methods, such as reinfo...
research
05/24/2023

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...
research
01/19/2012

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of i...
research
06/28/2022

Dynamic Memory for Interpretable Sequential Optimisation

Real-world applications of reinforcement learning for recommendation and...

Please sign up or login with your details

Forgot password? Click here to reset