Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

02/27/2023
by   Tiancheng Jin, et al.
0

We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the 1/2-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. For some regularizers, our regret bounds also improve upon prior results even when uniqueness holds. We further provide an application of our results to the decoupled exploration and exploitation problem, demonstrating that our techniques are broadly applicable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2023

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

The linear bandit problem has been studied for many years in both stocha...
research
06/14/2022

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

This paper considers the multi-armed bandit (MAB) problem and provides a...
research
06/08/2021

Scale Free Adversarial Multi Armed Bandits

We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, ...
research
01/10/2018

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed...
research
05/26/2023

Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds

Adaptivity to the difficulties of a problem is a key property in sequent...
research
03/04/2020

Bandits with adversarial scaling

We study "adversarial scaling", a multi-armed bandit model where rewards...
research
05/27/2022

Meta-Learning Adversarial Bandits

We study online learning with bandit feedback across multiple tasks, wit...

Please sign up or login with your details

Forgot password? Click here to reset