Better Best of Both Worlds Bounds for Bandits with Switching Costs
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of πͺ(T^2/3) in the oblivious adversarial setting and a bound of πͺ(min{log (T)/Ξ^2,T^2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Ξ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of πͺ(T^1/3/Ξ). We accompany our results with a lower bound showing that, in general, Ξ©Μ(min{1/Ξ^2,T^2/3}) regret is unavoidable in the stochastically-constrained case for algorithms with πͺ(T^2/3) worst-case regret.
READ FULL TEXT