Better Best of Both Worlds Bounds for Bandits with Switching Costs

06/07/2022
βˆ™
by   Idan Amir, et al.
βˆ™
0
βˆ™

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of π’ͺ(T^2/3) in the oblivious adversarial setting and a bound of π’ͺ(min{log (T)/Ξ”^2,T^2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Ξ” is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of π’ͺ(T^1/3/Ξ”). We accompany our results with a lower bound showing that, in general, Ξ©Μƒ(min{1/Ξ”^2,T^2/3}) regret is unavoidable in the stochastically-constrained case for algorithms with π’ͺ(T^2/3) worst-case regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 02/19/2021

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

We propose an algorithm for stochastic and adversarial multiarmed bandit...
research
βˆ™ 12/06/2019

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple fram...
research
βˆ™ 02/12/2022

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

We consider the problem of combining and learning over a set of adversar...
research
βˆ™ 05/30/2022

Adversarial Bandits Robust to S-Switch Regret

We study the adversarial bandit problem under S number of switching best...
research
βˆ™ 10/24/2019

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

We study the problem of switching-constrained online convex optimization...
research
βˆ™ 02/08/2023

Near-Optimal Adversarial Reinforcement Learning with Switching Costs

Switching costs, which capture the costs for changing policies, are rega...
research
βˆ™ 03/05/2018

Online learning over a finite action set with limited switching

This paper studies the value of switching actions in the Prediction From...

Please sign up or login with your details

Forgot password? Click here to reset