Adaptivity and Confounding in Multi-Armed Bandit Experiments

02/18/2022
by   Chao Qin, et al.
0

We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Context-unaware algorithms risk confounding while those that perform correct inference face information delays. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset