More Adaptive Algorithms for Adversarial Bandits

01/10/2018
by   Chen-Yu Wei, et al.
0

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret bounds improving previous works. Examples include: 1) a regret bound depending on the variance of only the best arm; 2) a regret bound depending on the first-order path-length of only the best arm; 3) a regret bound depending on the sum of first-order path-lengths of all arms as well as an important negative term, which together lead to faster convergence rates for some normal form games with partial feedback; 4) a regret bound that simultaneously implies small regret when the best arm has small loss and logarithmic regret when there exists an arm whose expected loss is always smaller than those of others by a fixed gap (e.g. the classic i.i.d. setting). In some cases, such as the last two results, our algorithm is completely parameter-free. The main idea of our algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer. The challenges are to come up with appropriate optimistic predictions and correction terms in this framework. Some of our results also crucially rely on using a sophisticated increasing learning rate schedule.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

This paper considers the multi-armed bandit (MAB) problem and provides a...
research
06/08/2021

Scale Free Adversarial Multi Armed Bandits

We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, ...
research
01/29/2019

Improved Path-length Regret Bounds for Bandits

We study adaptive regret bounds in terms of the variation of the losses ...
research
05/30/2022

Adversarial Bandits Robust to S-Switch Regret

We study the adversarial bandit problem under S number of switching best...
research
05/26/2023

Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds

Adaptivity to the difficulties of a problem is a key property in sequent...
research
02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...
research
12/19/2016

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, o...

Please sign up or login with your details

Forgot password? Click here to reset