Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

06/30/2016
by   Alexander Luedtke, et al.
0

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2015

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

We discuss a multiple-play multi-armed bandit (MAB) problem in which sev...
research
11/16/2017

Budget-Constrained Multi-Armed Bandits with Multiple Plays

We study the multi-armed bandit problem with multiple plays and a budget...
research
02/23/2017

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB ++ algorithm for regret minimization in stochastic...
research
06/21/2021

On Limited-Memory Subsampling Strategies for Bandits

There has been a recent surge of interest in nonparametric bandit algori...
research
09/14/2020

Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

In the regret-based formulation of multi-armed bandit (MAB) problems, ex...
research
01/30/2020

Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

We study an extension of the classic stochastic multi-armed bandit probl...
research
11/13/2022

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

We investigate the Multi-Armed Bandit problem with Temporally-Partitione...

Please sign up or login with your details

Forgot password? Click here to reset