Meta-Learning Adversarial Bandits

05/27/2022
by   Maria-Florina Balcan, et al.
0

We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the initialization, step-size, and entropy parameter of the Tsallis-entropy generalization of the well-known Exp3 method, with the task-averaged regret provably improving if the entropy of the distribution over estimated optima-in-hindsight is small. For BLO, we learn the initialization, step-size, and boundary-offset of online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with a measure induced by these functions on the interior of the action space. Our adaptive guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-convex sequence of affine functions of Bregman divergences that upper-bound the regret of OMD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Meta-Learning Adversarial Bandit Algorithms

We study online meta-learning with bandit feedback, with the goal of imp...
research
08/19/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

We analyze the meta-learning of the initialization and step-size of lear...
research
02/04/2021

Meta-strategy for Learning Tuning Parameters with Guarantees

Online gradient methods, like the online gradient algorithm (OGA), often...
research
05/31/2022

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...
research
02/26/2022

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Online learning in large-scale structured bandits is known to be challen...
research
05/18/2020

Meta-learning with Stochastic Linear Bandits

We investigate meta-learning procedures in the setting of stochastic lin...
research
02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...

Please sign up or login with your details

Forgot password? Click here to reset