Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

02/17/2017
by   Zifan Li, et al.
0

Recent work on follow the perturbed leader (FTPL) algorithms for the adversarial multi-armed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations. Assuming that the hazard rate is bounded, it is possible to provide regret analyses for a variety of FTPL algorithms for the multi-armed bandit problem. This paper pushes the inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate condition. There are good reasons to do so: natural distributions such as the uniform and Gaussian violate the condition. We give regret bounds for both bounded support and unbounded support distributions without assuming the hazard rate condition. We also disprove a conjecture that the Gaussian distribution cannot lead to a low-regret algorithm. In fact, it turns out that it leads to near optimal regret, up to logarithmic factors. A key ingredient in our approach is the introduction of a new notion called the generalized hazard rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2013

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the va...
research
12/14/2015

Fighting Bandits with a New Kind of Smoothness

We define a novel family of algorithms for the adversarial multi-armed b...
research
03/10/2023

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

In this paper we propose a general methodology to derive regret bounds f...
research
02/02/2019

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

We investigate the optimality of perturbation based algorithms in the st...
research
02/17/2015

Regret bounds for Narendra-Shapiro bandit algorithms

Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have be...
research
10/11/2018

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial...
research
07/10/2018

Bandits with Side Observations: Bounded vs. Logarithmic Regret

We consider the classical stochastic multi-armed bandit but where, from ...

Please sign up or login with your details

Forgot password? Click here to reset