Fighting Bandits with a New Kind of Smoothness

12/14/2015
by   Jacob Abernethy, et al.
0

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, achieves the Θ(√(TN)) minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(√(TN N)) if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2017

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adv...
research
10/11/2018

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial...
research
02/02/2019

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

We investigate the optimality of perturbation based algorithms in the st...
research
01/25/2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

We develop the first general semi-bandit algorithm that simultaneously a...
research
09/26/2020

Near-Optimal MNL Bandits Under Risk Criteria

We study MNL bandits, which is a variant of the traditional multi-armed ...
research
05/18/2022

The Multisecretary problem with many types

We study the multisecretary problem with capacity to hire up to B out of...
research
02/10/2022

Adaptively Exploiting d-Separators with Causal Bandits

Multi-armed bandit problems provide a framework to identify the optimal ...

Please sign up or login with your details

Forgot password? Click here to reset