Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

11/29/2021
by   Sepehr Elahi, et al.
0

In many real-world applications of combinatorial bandits such as content caching, rewards must be maximized while satisfying minimum service requirements. In addition, base arm availabilities vary over time, and actions need to be adapted to the situation to maximize the rewards. We propose a new bandit model called Contextual Combinatorial Volatile Bandits with Group Thresholds to address these challenges. Our model subsumes combinatorial bandits by considering super arms to be subsets of groups of base arms. We seek to maximize super arm rewards while satisfying thresholds of all base arm groups that constitute a super arm. To this end, we define a new notion of regret that merges super arm reward maximization with group reward satisfaction. To facilitate learning, we assume that the mean outcomes of base arms are samples from a Gaussian Process indexed by the context set X, and the expected reward is Lipschitz continuous in expected base arm outcomes. We propose an algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence Bounds (TCGP-UCB), that balances between maximizing cumulative reward and satisfying group reward thresholds and prove that it incurs Õ(K√(Tγ_T) ) regret with high probability, where γ_T is the maximum information gain associated with the set of base arm contexts that appeared in the first T rounds and K is the maximum super arm cardinality of any feasible action over all rounds. We show in experiments that our algorithm accumulates a reward comparable with that of the state-of-the-art combinatorial bandit algorithm while picking actions whose groups satisfy their thresholds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
research
12/02/2021

Risk-Aware Algorithms for Combinatorial Semi-Bandits

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
12/26/2022

Gaussian Process Classification Bandits

Classification bandits are multi-armed bandit problems whose task is to ...
research
06/08/2022

Neural Bandit with Arm Group Graph

Contextual bandits aim to identify among a set of arms the optimal one w...
research
01/15/2019

Combinatorial Sleeping Bandits with Fairness Constraints

The multi-armed bandit (MAB) model has been widely adopted for studying ...
research
12/24/2021

Gaussian Process Bandits with Aggregated Feedback

We consider the continuum-armed bandits problem, under a novel setting o...
research
12/25/2022

Linear Combinatorial Semi-Bandit with Causally Related Rewards

In a sequential decision-making problem, having a structural dependency ...

Please sign up or login with your details

Forgot password? Click here to reset