Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

by   Kyungjae Lee, et al.

In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-th moment is bounded by a constant ν_p for 1<p≤2. First, we propose a novel robust estimator which does not require ν_p as prior information, while other existing robust estimators demand prior knowledge about ν_p. We show that an error probability of the proposed estimator decays exponentially fast. Using this estimator, we propose a perturbation-based exploration strategy and develop a generalized regret analysis scheme that provides upper and lower regret bounds by revealing the relationship between the regret and the cumulative density function of the perturbation. From the proposed analysis scheme, we obtain gap-dependent and gap-independent upper and lower regret bounds of various perturbations. We also find the optimal hyperparameters for each perturbation, which can achieve the minimax optimal regret bound with respect to total rounds. In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various p values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods.


page 1

page 2

page 3

page 4


Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...

Boltzmann Exploration Done Right

Boltzmann exploration is a classic strategy for sequential decision-maki...

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...

Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

Eigenvector perturbation analysis plays a vital role in various statisti...

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...

Robust Dynamic Assortment Optimization in the Presence of Outlier Customers

We consider the dynamic assortment optimization problem under the multin...