A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

03/06/2020
by   P Sharoff, et al.
0

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the chosen action yields a stochastic reward. The agent seeks to maximize its cumulative reward over a finite time budget, with the option of "giving up" on a current action – hence forfeiting any reward – in order to choose another action. We cast this problem as a variant of the stochastic multi-armed bandits problem with stochastic consumption of resource. For this problem, we first establish that the optimal arm is the one that maximizes the ratio of the expected reward of the arm to the expected waiting time before the agent sees the reward due to pulling that arm. Using a novel upper confidence bound on this ratio, we then introduce an upper confidence based-algorithm, WAIT-UCB, for which we establish logarithmic, problem-dependent regret bound which has an improved dependence on problem parameters compared to previous works. Simulations on various problem configurations comparing WAIT-UCB against the state-of-the-art algorithms are also presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where...
research
07/09/2017

Nonlinear Sequential Accepts and Rejects for Identification of Top Arms in Stochastic Bandits

We address the M-best-arm identification problem in multi-armed bandits....
research
02/23/2017

Rotting Bandits

The Multi-Armed Bandits (MAB) framework highlights the tension between a...
research
10/23/2020

Finite Continuum-Armed Bandits

We consider a situation where an agent has T ressources to be allocated ...
research
11/16/2020

Distributed Bandits: Probabilistic Communication on d-regular Graphs

We study the decentralized multi-agent multi-armed bandit problem for ag...
research
10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
research
08/28/2023

Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages

The multi-armed bandit (MAB) problem is a classical problem that models ...

Please sign up or login with your details

Forgot password? Click here to reset