An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

02/19/2021
by   Chloé Rouyer, et al.
14

We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λ K)^1/3T^2/3 + √(KT)), where T is the time horizon and K is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O(((λ K)^2/3 T^1/3 + ln T)∑_i ≠ i^*Δ_i^-1), where Δ_i are the suboptimality gaps and i^* is a unique optimal arm. In the special case of λ = 0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Better Best of Both Worlds Bounds for Bandits with Switching Costs

We study best-of-both-worlds algorithms for bandits with switching cost,...
research
03/23/2021

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmer...
research
07/19/2018

An Optimal Algorithm for Stochastic and Adversarial Bandits

We provide an algorithm that achieves the optimal (up to constants) fini...
research
03/16/2023

Anomaly Search Over Many Sequences With Switching Costs

This paper considers the quickest search problem to identify anomalies a...
research
06/29/2022

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

We present a modified tuning of the algorithm of Zimmert and Seldin [202...
research
09/05/2018

Anytime Hedge achieves optimal regret in the stochastic regime

This paper is about a surprising fact: we prove that the anytime Hedge a...
research
03/25/2018

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corrupti...

Please sign up or login with your details

Forgot password? Click here to reset