Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits

04/09/2012
by   Long Tran-Thanh, et al.
0

In budget-limited multi-armed bandit (MAB) problems, the learner's actions are costly and constrained by a fixed budget. Consequently, an optimal exploitation policy may not be to pull the optimal arm repeatedly, as is the case in other variants of MAB, but rather to pull the sequence of different arms that maximises the agent's total reward within the budget. This difference from existing MABs means that new approaches to maximising the total reward are required. Given this, we develop two pulling policies, namely: (i) KUBE; and (ii) fractional KUBE. Whereas the former provides better performance up to 40 in our experimental settings, the latter is computationally less expensive. We also prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal (i.e. they only differ from the best possible regret by a constant factor).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Budget-Constrained Multi-Armed Bandits with Multiple Plays

We study the multi-armed bandit problem with multiple plays and a budget...
research
06/12/2023

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where...
research
01/22/2021

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem i...
research
10/12/2021

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

We consider a stochastic multi-armed bandit setting where feedback is li...
research
05/19/2021

Incentivized Bandit Learning with Self-Reinforcing User Preferences

In this paper, we investigate a new multi-armed bandit (MAB) online lear...
research
03/23/2021

Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

We consider a dynamic Colonel Blotto game (CBG) in which one of the play...
research
07/20/2016

On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits

The Knowledge Gradient (KG) policy was originally proposed for online ra...

Please sign up or login with your details

Forgot password? Click here to reset