Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

10/24/2020
by   Kyungjae Lee, et al.
0

In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-th moment is bounded by a constant ν_p for 1<p≤2. First, we propose a novel robust estimator which does not require ν_p as prior information, while other existing robust estimators demand prior knowledge about ν_p. We show that an error probability of the proposed estimator decays exponentially fast. Using this estimator, we propose a perturbation-based exploration strategy and develop a generalized regret analysis scheme that provides upper and lower regret bounds by revealing the relationship between the regret and the cumulative density function of the perturbation. From the proposed analysis scheme, we obtain gap-dependent and gap-independent upper and lower regret bounds of various perturbations. We also find the optimal hyperparameters for each perturbation, which can achieve the minimax optimal regret bound with respect to total rounds. In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various p values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...
research
03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...
research
05/29/2017

Boltzmann Exploration Done Right

Boltzmann exploration is a classic strategy for sequential decision-maki...
research
10/08/2017

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...
research
02/06/2023

On Private and Robust Bandits

We study private and robust multi-armed bandits (MABs), where the agent ...
research
04/07/2021

Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

Eigenvector perturbation analysis plays a vital role in various statisti...
research
02/19/2020

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...

Please sign up or login with your details

Forgot password? Click here to reset