Batched Multi-Armed Bandits with Optimal Regret
We present a simple and efficient algorithm for the batched stochastic multi-armed bandit problem. We prove a bound for its expected regret that improves over the best-known regret bound, for any number of batches. In particular, our algorithm achieves the optimal expected regret by using only a logarithmic number of batches.
READ FULL TEXT