Adapting Bandit Algorithms for Settings with Sequentially Available Arms

09/30/2021
by   Marco Gabrielli, et al.
0

Although the classical version of the Multi-Armed Bandits (MAB) framework has been applied successfully to several practical problems, in many real-world applications, the possible actions are not presented to the learner simultaneously, such as in the Internet campaign management and environmental monitoring settings. Instead, in such applications, a set of options is presented sequentially to the learner within a time span, and this process is repeated throughout a time horizon. At each time, the learner is asked whether to select the proposed option or not. We define this scenario as the Sequential Pull/No-pull Bandit setting, and we propose a meta-algorithm, namely Sequential Pull/No-pull for MAB (Seq), to adapt any classical MAB policy to better suit this setting for both the regret minimization and best-arm identification problems. By allowing the selection of multiple arms within a round, the proposed meta-algorithm gathers more information, especially in the first rounds, characterized by a high uncertainty in the arms estimate value. At the same time, the adapted algorithms provide the same theoretical guarantees as the classical policy employed. The Seq meta-algorithm was extensively tested and compared with classical MAB policies on synthetic and real-world datasets from advertising and environmental monitoring applications, highlighting its good empirical performances.

READ FULL TEXT
research
11/16/2022

Dueling Bandits: From Two-dueling to Multi-dueling

We study a general multi-dueling bandit problem, where an agent compares...
research
02/25/2022

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

We study a sequential decision problem where the learner faces a sequenc...
research
02/11/2020

Online Preselection with Context Information under the Plackett-Luce Model

We consider an extension of the contextual multi-armed bandit problem, i...
research
10/01/2020

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-...
research
10/23/2020

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Off-policy evaluation (OPE) is the problem of estimating the value of a ...
research
11/02/2021

Nonstochastic Bandits and Experts with Arm-Dependent Delays

We study nonstochastic bandits and experts in a delayed setting where de...
research
02/08/2016

Decoy Bandits Dueling on a Poset

We adress the problem of dueling bandits defined on partially ordered se...

Please sign up or login with your details

Forgot password? Click here to reset