Regret Minimization in Heavy-Tailed Bandits

02/07/2021
by   Shubhada Agrawal, et al.
7

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order (1+ϵ) are uniformly bounded by a known constant B, for some given ϵ > 0. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...
research
09/08/2012

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the re...
research
07/08/2019

Thompson Sampling on Symmetric α-Stable Bandits

Thompson Sampling provides an efficient technique to introduce prior kno...
research
10/25/2018

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

In linear stochastic bandits, it is commonly assumed that payoffs are wi...
research
07/31/2020

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

We study the problem of estimating the mean of a distribution in high di...
research
10/26/2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Despite a large amount of effort in dealing with heavy-tailed error in m...
research
03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...

Please sign up or login with your details

Forgot password? Click here to reset