Smooth Sequential Optimisation with Delayed Feedback

06/21/2021

∙

Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50 treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8 in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate human-in-the-loop sequential optimisation.

READ FULL TEXT

Smooth Sequential Optimisation with Delayed Feedback

Sign in with Google

Consider DeepAI Pro