Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

by   Han Zhong, et al.

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise η satisfies Pr[|η| > |y|] ≤ 1/|y|^α for some α > 0. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.


page 1

page 2

page 3

page 4


Minimax Policy for Heavy-tailed Multi-armed Bandits

We study the stochastic Multi-Armed Bandit (MAB) problem under worst cas...

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the re...

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic mul...

Non-Gaussian Bayesian Filtering by Density Parametrization Using Power Moments

Non-Gaussian Bayesian filtering is a core problem in stochastic filterin...

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...

Robust Inference via Multiplier Bootstrap

This paper investigates the theoretical underpinnings of two fundamental...

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

We study the problem of estimating the mean of a distribution in high di...