Scale Free Adversarial Multi Armed Bandits

06/08/2021
by   Sudeep Raja Putta, et al.
0

We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, where the player only knows the number of arms n and not the scale or magnitude of the losses. It sees bandit feedback about the loss vectors l_1,…, l_T ∈ℝ^n. The goal is to bound its regret as a function of n and l_1,…, l_T. We design a Follow The Regularized Leader(FTRL) algorithm, which comes with the first scale-free regret guarantee for MAB. It uses the log barrier regularizer, the importance weighted estimator, an adaptive learning rate, and an adaptive exploration parameter. In the analysis, we introduce a simple, unifying technique for obtaining regret inequalities for FTRL and Online Mirror Descent(OMD) on the probability simplex using Potential Functions and Mixed Bregmans. We also develop a new technique for obtaining local-norm lower bounds for Bregman Divergences, which are crucial in bandit regret bounds. These tools could be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Improved Path-length Regret Bounds for Bandits

We study adaptive regret bounds in terms of the variation of the losses ...
research
05/26/2023

Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds

Adaptivity to the difficulties of a problem is a key property in sequent...
research
10/26/2021

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem ...
research
01/10/2018

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed...
research
03/13/2023

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

The linear bandit problem has been studied for many years in both stocha...
research
05/26/2017

Online Auctions and Multi-scale Online Learning

We consider revenue maximization in online auctions and pricing. A selle...
research
02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...

Please sign up or login with your details

Forgot password? Click here to reset