Batched Multi-Armed Bandits with Optimal Regret

10/11/2019
by   Hossein Esfandiari, et al.
0

We present a simple and efficient algorithm for the batched stochastic multi-armed bandit problem. We prove a bound for its expected regret that improves over the best-known regret bound, for any number of batches. In particular, our algorithm achieves the optimal expected regret by using only a logarithmic number of batches.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset