Bandits with Side Observations: Bounded vs. Logarithmic Regret

07/10/2018
by   Rémy Degenne, et al.
0

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency ϵ, an extra observation is gathered by the agent for free. We prove that, no matter how small ϵ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than ∑_i (1/ϵ)/Δ_i, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro