Contextual Combinatorial Volatile Bandits via Gaussian Processes

10/05/2021
by   Andi Nika, et al.
0

We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability. At the beginning of each round, the agent observes the set of available base arms and their contexts and then selects an action that is a feasible subset of the set of available base arms to maximize its cumulative reward in the long run. We assume that the mean outcomes of base arms are samples from a Gaussian Process indexed by the context set X, and the expected reward is Lipschitz continuous in expected base arm outcomes. For this setup, we propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) and prove that it incurs Õ(K√(Tγ_T) ) regret with high probability, where γ_T is the maximum information gain associated with the set of base arm contexts that appeared in the first T rounds and K is the maximum cardinality of any feasible action over all rounds. To dramatically speed up the algorithm, we also propose a variant of O'CLOK-UCB that uses sparse GPs. Finally, we experimentally show that both algorithms exploit inter-base arm outcome correlation and vastly outperform the previous state-of-the-art UCB-based algorithms in realistic setups.

READ FULL TEXT
research
11/29/2021

Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

In many real-world applications of combinatorial bandits such as content...
research
07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
research
05/31/2023

Combinatorial Neural Bandits

We consider a contextual combinatorial bandit problem where in each roun...
research
02/08/2022

Budgeted Combinatorial Multi-Armed Bandits

We consider a budgeted combinatorial multi-armed bandit setting where, i...
research
05/22/2021

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit ...
research
09/03/2010

Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs

We motivate and analyse a new Tree Search algorithm, GPTS, based on rece...
research
03/06/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take...

Please sign up or login with your details

Forgot password? Click here to reset