Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

10/30/2021
by   Nina Deliu, et al.
0

Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference (e.g., biased estimators, inflated type-I error and reduced power). Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit algorithm-trading off regret-and require large sample sizes to ensure asymptotic guarantees. However, large experiments generally follow a successful pilot study, which is tightly constrained in its size or duration. Increasing power in such small pilot experiments, without limiting the adaptive nature of the algorithm, can allow promising interventions to reach a larger experimental phase. In this work we introduce a novel hypothesis test, uniquely based on the allocation probabilities of the bandit algorithm, and without constraining its exploitative nature or requiring a minimum experimental size. We characterise our Allocation Probability Test when applied to Thompson Sampling, presenting its asymptotic theoretical properties, and illustrating its finite-sample performances compared to state-of-the-art approaches. We demonstrate the regret and inferential advantages of our approach, particularly in small samples, in both extensive simulations and in a real-world experiment on mental health aspects.

READ FULL TEXT
research
10/16/2021

Statistical Consequences of Dueling Bandits

Multi-Armed-Bandit frameworks have often been used by researchers to ass...
research
01/31/2014

Online Clustering of Bandits

We introduce a novel algorithmic approach to content recommendation base...
research
11/16/2020

Policy choice in experiments with unknown interference

This paper discusses experimental design for inference and estimation of...
research
08/14/2019

Thompson Sampling and Approximate Inference

We study the effects of approximate inference on the performance of Thom...
research
05/24/2023

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...
research
07/02/2019

Bandit Learning Through Biased Maximum Likelihood Estimation

We propose BMLE, a new family of bandit algorithms, that are formulated ...

Please sign up or login with your details

Forgot password? Click here to reset