Minimax Policy for Heavy-tailed Multi-armed Bandits

07/20/2020
by   Lai Wei, et al.
0

We study the stochastic Multi-Armed Bandit (MAB) problem under worst case regret and heavy-tailed reward distribution. We modify the minimax policy MOSS <cit.> for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order 1+ϵ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution dependent logarithm regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

We design new policies that ensure both worst-case optimality for expect...
research
01/22/2021

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem i...
research
09/08/2012

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the re...
research
06/12/2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for r...
research
04/10/2023

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

We study the trade-off between expectation and tail risk for regret dist...
research
10/26/2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Despite a large amount of effort in dealing with heavy-tailed error in m...
research
07/14/2020

Optimal Learning for Structured Bandits

We study structured multi-armed bandits, which is the problem of online ...

Please sign up or login with your details

Forgot password? Click here to reset