An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

06/10/2015
by   Shipra Agrawal, et al.
0

We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set of context-dependent policies. This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it. We give a computationally efficient algorithm for this problem with slightly better regret bounds, by generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. The computational time of our algorithm scales logarithmically in the size of the policy space. This answers the main open question of Badanidiyuru et al. (2014). We also extend our results to a variant where there are no knapsack constraints but the objective is an arbitrary Lipschitz concave function of the sum of outcome vectors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2022

Efficient Contextual Bandits with Knapsacks via Regression

We consider contextual bandits with knapsacks (CBwK), a variant of the c...
research
09/02/2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...
research
09/15/2012

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
10/26/2021

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem ...
research
09/24/2022

Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem

Bandits with knapsacks (BwK) is an influential model of sequential decis...
research
05/18/2022

Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs

Reinforcement learning (RL) generalizes bandit problems with additional ...
research
01/31/2023

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

We consider the linear contextual multi-class multi-period packing probl...

Please sign up or login with your details

Forgot password? Click here to reset