Allocating Divisible Resources on Arms with Unknown and Random Rewards

06/28/2023
by   Ningyuan Chen, et al.
0

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order b of the allocated resource. In particular, if the decision maker allocates resource A_i to arm i in a period, then the reward Y_i isY_i(A_i)=A_i μ_i+A_i^b ξ_i, where μ_i is the unknown mean and the noise ξ_i is independent and sub-Gaussian. When the order b ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for b∈ [0,1], and demonstrate a phase transition at b=1/2. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2022

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandi...
research
12/13/2020

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

We study the multi-armed bandit (MAB) problem with composite and anonymo...
research
05/16/2020

Learning and Optimization with Seasonal Patterns

Seasonality is a common form of non-stationary patterns in the business ...
research
05/22/2021

From Finite to Countable-Armed Bandits

We consider a stochastic bandit problem with countably many arms that be...
research
09/12/2023

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Thompson sampling (TS) is one of the most popular and earliest algorithm...
research
12/29/2021

Socially-Optimal Mechanism Design for Incentivized Online Learning

Multi-arm bandit (MAB) is a classic online learning framework that studi...
research
03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

Please sign up or login with your details

Forgot password? Click here to reset