Allocating Divisible Resources on Arms with Unknown and Random Rewards

06/28/2023

∙

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order b of the allocated resource. In particular, if the decision maker allocates resource A_i to arm i in a period, then the reward Y_i isY_i(A_i)=A_i μ_i+A_i^b ξ_i, where μ_i is the unknown mean and the noise ξ_i is independent and sub-Gaussian. When the order b ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for b∈ [0,1], and demonstrate a phase transition at b=1/2. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

READ FULL TEXT

Allocating Divisible Resources on Arms with Unknown and Random Rewards

Sign in with Google

Consider DeepAI Pro