KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

05/14/2018
by   Aurélien Garivier, et al.
0

In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that it enjoys simultaneously a distribution-free regret bound of optimal order √(KT) and a distribution-dependent regret bound of optimal order as well, that is, matching the κ T lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro