Adaptivity and Confounding in Multi-Armed Bandit Experiments

02/18/2022
by   Chao Qin, et al.
0

We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Context-unaware algorithms risk confounding while those that perform correct inference face information delays. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2021

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

We consider nonstationary multi-armed bandit problems where the model pa...
research
05/14/2012

Multiple Identifications in Multi-Armed Bandits

We study the problem of identifying the top m arms in a multi-armed band...
research
11/13/2019

Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling

As the cornerstone of modern portfolio theory, Markowitz's mean-variance...
research
10/16/2021

Statistical Consequences of Dueling Bandits

Multi-Armed-Bandit frameworks have often been used by researchers to ass...
research
02/24/2022

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-a...
research
05/19/2022

Adaptive Experiments and a Rigorous Framework for Type I Error Verification and Computational Experiment Design

This PhD thesis covers breakthroughs in several areas of adaptive experi...

Please sign up or login with your details

Forgot password? Click here to reset