Online Meta-Learning in Adversarial Multi-Armed Bandits

05/31/2022
by   Ilya Osadchiy, et al.
0

We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes. In the case where the best arm distribution is far from uniform, it improves upon the best bound that can be achieved by any online algorithm executed on each episode individually without meta-learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Meta-Learning Adversarial Bandit Algorithms

We study online meta-learning with bandit feedback, with the goal of imp...
research
07/17/2017

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which...
research
02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...
research
09/09/2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) proble...
research
09/19/2021

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

We study the adversarial multi-armed bandit problem and create a complet...
research
05/27/2022

Meta-Learning Adversarial Bandits

We study online learning with bandit feedback across multiple tasks, wit...
research
02/27/2020

Online Learning for Active Cache Synchronization

Existing multi-armed bandit (MAB) models make two implicit assumptions: ...

Please sign up or login with your details

Forgot password? Click here to reset