On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks

11/25/2022
by   Rui Ai, et al.
0

In the problem of (binary) contextual bandits with knapsacks (CBwK), the agent receives an i.i.d. context in each of the T rounds and chooses an action, resulting in a random reward and a random consumption of resources that are related to an i.i.d. external factor. The agent's goal is to maximize the accumulated reward under the initial resource constraints. In this work, we combine the re-solving heuristic, which proved successful in revenue management, with distribution estimation techniques to solve this problem. We consider two different information feedback models, with full and partial information, which vary in the difficulty of getting a sample of the external factor. Under both information feedback settings, we achieve two-way results: (1) For general problems, we show that our algorithm gets an O(T^α_u + T^α_v + T^1/2) regret against the fluid benchmark. Here, α_u and α_v reflect the complexity of the context and external factor distributions, respectively. This result is comparable to existing results. (2) When the fluid problem is linear programming with a unique and non-degenerate optimal solution, our algorithm leads to an O(1) regret. To the best of our knowledge, this is the first O(1) regret result in the CBwK problem regardless of information feedback models. We further use numerical experiments to verify our results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

We study the stochastic contextual bandit with knapsacks (CBwK) problem,...
research
01/31/2023

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

We consider the linear contextual multi-class multi-period packing probl...
research
09/15/2022

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
08/21/2023

Clustered Linear Contextual Bandits with Knapsacks

In this work, we study clustered contextual bandits where rewards and re...
research
12/10/2020

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

This paper studies the adversarial graphical contextual bandits, a varia...

Please sign up or login with your details

Forgot password? Click here to reset