Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). In the adversarial regime with a self-bounding constraint and the stochastic regime with adversarial corruptions as its special case we improve the dependence on corruption magnitude C. In particular, for C = Θ(T/log T), where T is the time horizon, we achieve an improvement by a multiplicative factor of √(log T/loglog T) relative to the bound of Zimmert and Seldin (2021). We also improve the dependence of the regret bound on time horizon from log T to log(K-1)T/(∑_i≠ i^*1/Δ_i)^2, where K is the number of arms, Δ_i are suboptimality gaps for suboptimal arms i, and i^* is the optimal arm. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.
READ FULL TEXT