Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

01/28/2022
by   Jiatai Huang, et al.
0

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1<α≤ 2) moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm , when the heavy-tail parameters α and σ are known to the agent, simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α,σ are unknown, achieves a log T-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm , achieving 𝒪(σ K^1-1/αT^1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

In this paper, we consider stochastic multi-armed bandits (MABs) with he...
research
02/07/2021

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic mul...
research
07/12/2022

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Online learning to rank (OLTR) interactively learns to choose lists of i...
research
03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...
research
03/03/2020

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

We consider the problem of finitely parameterized multi-armed bandits wh...
research
02/11/2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

In this work, we develop linear bandit algorithms that automatically ada...
research
03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...

Please sign up or login with your details

Forgot password? Click here to reset