Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

01/31/2023
by   Wonyoung Kim, et al.
0

We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible. We consider the setting where the reward and the consumption vector associated with each action is a class-dependent linear function of the context, and the decision-maker receives bandit feedback. LMMP includes linear contextual bandits with knapsacks and online revenue management as special cases. We establish a new more efficient estimator which guarantees a faster convergence rate, and consequently, a lower regret in such problems. We propose a bandit policy that is a closed-form function of said estimated parameters. When the contexts are non-degenerate, the regret of the proposed policy is sublinear in the context dimension, the number of classes, and the time horizon T when the budget grows at least as √(T). We also resolve an open problem posed in Agrawal Devanur (2016), and extend the result to a multi-class setting. Our numerical experiments clearly demonstrate that the performance of our policy is superior to other benchmarks in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

We study the stochastic contextual bandit with knapsacks (CBwK) problem,...
research
07/26/2023

Online learning in bandits with predicted context

We consider the contextual bandit problem where at each time, the agent ...
research
11/25/2022

On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks

In the problem of (binary) contextual bandits with knapsacks (CBwK), the...
research
07/07/2021

Neural Contextual Bandits without Regret

Contextual bandits are a rich model for sequential decision making given...
research
06/11/2022

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

We propose a novel algorithm for linear contextual bandits with O(√(dT l...
research
06/10/2015

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

We consider a contextual version of multi-armed bandit problem with glob...
research
03/02/2017

Active Learning for Accurate Estimation of Linear Models

We explore the sequential decision making problem where the goal is to e...

Please sign up or login with your details

Forgot password? Click here to reset