KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

05/14/2018 ∙ by Aurélien Garivier, et al. ∙ 0

In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that it enjoys simultaneously a distribution-free regret bound of optimal order √(KT) and a distribution-dependent regret bound of optimal order as well, that is, matching the κ T lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996).

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.