Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

09/04/2019
by   Arun Verma, et al.
0

In this paper, we study Censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is random and depends on the amount of resource allocated to it. More specifically, the loss equals zero if the allocation for the arm exceeds a constant (but unknown) threshold that can be dependent on the arm. Our goal is to learn a feasible allocation that minimizes the expected loss. The problem is challenging because the loss distribution and threshold value of each arm are unknown. We study this novel setting by establishing its `equivalence' to Multiple-Play Multi-Armed Bandits (MP-MAB) and Combinatorial Semi-Bandits. Exploiting these equivalences, we derive optimal algorithms for our setting using existing algorithms for MP-MAB and Combinatorial Semi-Bandits. Experiments on synthetically generated data validate performance guarantees of the proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Censored Semi-Bandits for Resource Allocation

We consider the problem of sequentially allocating resources in a censor...
research
06/17/2020

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

In this paper, we study a novel Stochastic Network Utility Maximization ...
research
09/16/2020

Thompson Sampling for Unsupervised Sequential Selection

Thompson Sampling has generated significant interest due to its better e...
research
09/05/2019

An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits

Combinatorial linear semi-bandits (CLS) are widely applicable frameworks...
research
12/22/2022

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/rewar...
research
12/22/2022

Synopsis: Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/rewar...
research
02/14/2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...

Please sign up or login with your details

Forgot password? Click here to reset