Fighting Bandits with a New Kind of Smoothness

12/14/2015
by   Jacob Abernethy, et al.
0

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, achieves the Θ(√(TN)) minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(√(TN N)) if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/17/2017

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adv...
10/11/2018

Fighting Contextual Bandits with Stochastic Smoothing

We introduce a new stochastic smoothing perspective to study adversarial...
02/02/2019

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

We investigate the optimality of perturbation based algorithms in the st...
09/15/2012

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
09/26/2020

Near-Optimal MNL Bandits Under Risk Criteria

We study MNL bandits, which is a variant of the traditional multi-armed ...
08/25/2021

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...
01/25/2019

Gaussian One-Armed Bandit and Optimization of Batch Data Processing

We consider the minimax setup for Gaussian one-armed bandit problem, i.e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.