Stochastic Rising Bandits

12/07/2022
βˆ™
by   Alberto Maria Metelli, et al.
βˆ™
0
βˆ™

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of π’ͺ(T^2/3). We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
βˆ™ 01/25/2021

Online and Scalable Model Selection with Multi-Armed Bandits

Many online applications running on live traffic are powered by machine ...
research
βˆ™ 07/28/2017

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

The key challenge in multiagent learning is learning a best response to ...
research
βˆ™ 12/24/2020

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

The multi-armed bandits' framework is the most common platform to study ...
research
βˆ™ 02/08/2023

Non-Stationary Bandits with Knapsack Problems with Advice

We consider a non-stationary Bandits with Knapsack problem. The outcome ...
research
βˆ™ 07/08/2022

Information-Gathering in Latent Bandits

In the latent bandit problem, the learner has access to reward distribut...
research
βˆ™ 05/30/2022

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow...

Please sign up or login with your details

Forgot password? Click here to reset