Better Best of Both Worlds Bounds for Bandits with Switching Costs
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of 𝒪(T^2/3) in the oblivious adversarial setting and a bound of 𝒪(min{log (T)/Δ^2,T^2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Δ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of 𝒪(T^1/3/Δ). We accompany our results with a lower bound showing that, in general, Ω̃(min{1/Δ^2,T^2/3}) regret is unavoidable in the stochastically-constrained case for algorithms with 𝒪(T^2/3) worst-case regret.
READ FULL TEXT