Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

03/07/2022
by   Debabrota Basu, et al.
2

In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given the arm, the environment returns an uncorrupted reward with probability 1-ε and an arbitrarily corrupted reward with probability ε. In our setting, the uncorrupted reward might be heavy-tailed and the corrupted reward might be unbounded. We prove a lower bound on the regret indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tailed bandits. We observe that the environments can be categorised into hardness regimes depending on the suboptimality gap Δ, variance σ, and corruption proportion ϵ. Following this, we design a UCB-type algorithm, namely HuberUCB, that leverages Huber's estimator for robust mean estimation. HuberUCB leads to tight upper bounds on regret in the proposed corrupted and heavy-tailed setting. To derive the upper bound, we prove a novel concentration inequality for Huber's estimator, which might be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

Quantum Heavy-tailed Bandits

In this paper, we study multi-armed bandits (MAB) and stochastic linear ...
research
01/28/2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...
research
06/17/2022

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms

We generalize the multiple-play multi-armed bandits (MP-MAB) problem wit...
research
02/07/2021

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic mul...
research
10/19/2021

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

In this paper we initiate a study of non parametric contextual bandits u...
research
10/25/2018

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

In linear stochastic bandits, it is commonly assumed that payoffs are wi...
research
06/18/2020

Stochastic bandits with arm-dependent delays

Significant work has been recently dedicated to the stochastic delayed b...

Please sign up or login with your details

Forgot password? Click here to reset