Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

by   Jacob Nogas, et al.

Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference in arms when there truly is one. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). First, Top-Two Thompson Sampling adds a fixed amount of uniform random allocation (UR) spread evenly over time. Second, a novel heuristic algorithm, called TS PostDiff (Posterior Probability of Difference). TS PostDiff takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size.


page 1

page 2

page 3

page 4


Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

When using multi-armed bandit algorithms, the potential impact of missin...

Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits

Conducting randomized experiments in education settings raises the quest...

Doubly Robust Thompson Sampling for linear payoffs

A challenging aspect of the bandit problem is that a stochastic reward i...

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...

Most Correlated Arms Identification

We study the problem of finding the most mutually correlated arms among ...

Hypothesis Testing in Sequentially Sampled Data: AdapRT to Maximize Power Beyond iid Sampling

Testing whether a variable of interest affects the outcome is one of the...

Please sign up or login with your details

Forgot password? Click here to reset