Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

06/12/2023
by   Marco Heyden, et al.
0

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from K arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, ω-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2017

Budget-Constrained Multi-Armed Bandits with Multiple Plays

We study the multi-armed bandit problem with multiple plays and a budget...
research
03/06/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take...
research
11/07/2019

Confidence Intervals for Policy Evaluation in Adaptive Experiments

Adaptive experiments can result in considerable cost savings in multi-ar...
research
04/09/2012

Knapsack based Optimal Policies for Budget-Limited Multi-Armed Bandits

In budget-limited multi-armed bandit (MAB) problems, the learner's actio...
research
09/06/2022

Multi-Armed Bandits with Self-Information Rewards

This paper introduces the informational multi-armed bandit (IMAB) model ...
research
12/04/2020

One-bit feedback is sufficient for upper confidence bound policies

We consider a variant of the traditional multi-armed bandit problem in w...
research
05/26/2020

Arm order recognition in multi-armed bandit problem with laser chaos time series

By exploiting ultrafast and irregular time series generated by lasers wi...

Please sign up or login with your details

Forgot password? Click here to reset