Batched Multi-Armed Bandits with Optimal Regret

10/11/2019

∙

by Hossein Esfandiari, et al.

∙

We present a simple and efficient algorithm for the batched stochastic multi-armed bandit problem. We prove a bound for its expected regret that improves over the best-known regret bound, for any number of batches. In particular, our algorithm achieves the optimal expected regret by using only a logarithmic number of batches.

READ FULL TEXT

Batched Multi-Armed Bandits with Optimal Regret

Sign in with Google

Consider DeepAI Pro