Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

04/28/2020
by   Bo Xue, et al.
5

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+ϵ moments for some ϵ∈(0,1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of O(d^1/2T^1/1+ϵ), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an Ω(d^ϵ/1+ϵT^1/1+ϵ) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when ϵ=1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2018

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

In linear stochastic bandits, it is commonly assumed that payoffs are wi...
research
03/30/2019

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

We study the linear contextual bandit problem with finite action sets. W...
research
06/11/2022

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

We propose a novel algorithm for linear contextual bandits with O(√(dT l...
research
03/15/2023

Borda Regret Minimization for Generalized Linear Dueling Bandits

Dueling bandits are widely used to model preferential feedback that is p...
research
03/29/2016

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

I introduce and analyse an anytime version of the Optimally Confident UC...
research
06/12/2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for r...
research
05/30/2019

Multi-Objective Generalized Linear Bandits

In this paper, we study the multi-objective bandits (MOB) problem, where...

Please sign up or login with your details

Forgot password? Click here to reset