Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

01/03/2023
by   James K. He, et al.
0

Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2020

Multi-armed bandit approach to password guessing

The multi-armed bandit is a mathematical interpretation of the problem a...
research
02/03/2019

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

We study a multi-armed bandit problem with covariates in a setting where...
research
02/25/2014

Algorithms for multi-armed bandit problems

Although many algorithms for the multi-armed bandit problem are well-und...
research
04/07/2012

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) ...
research
03/04/2020

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Multi-armed bandit methods have been used for dynamic experiments partic...
research
08/22/2019

Online Inference for Advertising Auctions

Advertisers that engage in real-time bidding (RTB) to display their ads ...
research
05/12/2023

Designing Optimal Behavioral Experiments Using Machine Learning

Computational models are powerful tools for understanding human cognitio...

Please sign up or login with your details

Forgot password? Click here to reset