Meta-Learning Adversarial Bandit Algorithms

07/05/2023
by   Mikhail Khodak, et al.
0

We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Meta-Learning Adversarial Bandits

We study online learning with bandit feedback across multiple tasks, wit...
research
05/31/2022

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...
research
08/19/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

We analyze the meta-learning of the initialization and step-size of lear...
research
09/29/2021

Dynamic Regret Analysis for Online Meta-Learning

The online meta-learning framework has arisen as a powerful tool for the...
research
10/22/2019

Online Meta-Learning on Non-convex Setting

The online meta-learning framework is designed for the continual lifelon...
research
02/18/2023

Online Continuous Hyperparameter Optimization for Contextual Bandits

In stochastic contextual bandit problems, an agent sequentially makes ac...
research
05/30/2022

Meta Representation Learning with Contextual Linear Bandits

Meta-learning seeks to build algorithms that rapidly learn how to solve ...

Please sign up or login with your details

Forgot password? Click here to reset