Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

02/25/2022
by   MohammadJavad Azizi, et al.
0

We study a sequential decision problem where the learner faces a sequence of K-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of M arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks N as well as the total number of rounds T are known (N could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of Õ(√(KNT)) that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length τ, we show that the regret of the algorithm is bounded as Õ(N√(M τ)+N^2/3). Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved Õ(N√(M τ)+N^1/2) regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in b...
research
03/04/2023

MNL-Bandit in non-stationary environments

In this paper, we study the MNL-Bandit problem in a non-stationary envir...
research
02/01/2021

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, whic...
research
09/30/2021

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Although the classical version of the Multi-Armed Bandits (MAB) framewor...
research
05/29/2022

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

The stochastic multi-armed bandit setting has been recently studied in t...
research
02/02/2023

Algorithm Design for Online Meta-Learning with Task Boundary Detection

Online meta-learning has recently emerged as a marriage between batch me...
research
02/06/2023

Memory-Based Meta-Learning on Non-Stationary Distributions

Memory-based meta-learning is a technique for approximating Bayes-optima...

Please sign up or login with your details

Forgot password? Click here to reset