Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

10/25/2018
by   Han Shao, et al.
0

In linear stochastic bandits, it is commonly assumed that payoffs are with sub-Gaussian noises. In this paper, under a weaker assumption on noises, we study the problem of linear stochastic bandits with h eavy- tailed payoffs (LinBET), where the distributions have finite moments of order 1+ϵ, for some ϵ∈ (0,1]. We rigorously analyze the regret lower bound of LinBET as Ω(T^1/1+ϵ), implying that finite moments of order 2 (i.e., finite variances) yield the bound of Ω(√(T)), with T being the total number of rounds to play bandits. The provided lower bound also indicates that the state-of-the-art algorithms for LinBET are far from optimal. By adopting median of means with a well-designed allocation of decisions and truncation based on historical information, we develop two novel bandit algorithms, where the regret upper bounds match the lower bound up to polylogarithmic factors. To the best of our knowledge, we are the first to solve LinBET optimally in the sense of the polynomial order on T. Our proposed algorithms are evaluated based on synthetic datasets, and outperform the state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2020

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with fi...
research
09/08/2012

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the re...
research
02/07/2021

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic mul...
research
03/29/2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...
research
03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...
research
07/12/2022

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Online learning to rank (OLTR) interactively learns to choose lists of i...
research
03/16/2021

Stochastic Bandits for Multi-platform Budget Optimization in Online Advertising

We study the problem of an online advertising system that wants to optim...

Please sign up or login with your details

Forgot password? Click here to reset