Combinatorial Neural Bandits

05/31/2023
by   TaeHyun Hwang, et al.
0

We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB () and Combinatorial Neural Thompson Sampling (). We prove that achieves 𝒪̃(d̃√(T)) or 𝒪̃(√(d̃ T K)) regret, where d̃ is the effective dimension of a neural tangent kernel matrix, K is the size of a subset of arms, and T is the time horizon. For , we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of 𝒪̃(d̃√(TK)). To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular, is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2019

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, item recommendation, adaptive routing and dynami...
research
06/12/2021

Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

We consider the stochastic combinatorial semi-bandit problem with advers...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
10/05/2021

Contextual Combinatorial Volatile Bandits via Gaussian Processes

We consider a contextual bandit problem with a combinatorial action set ...
research
02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...
research
09/05/2019

An Arm-wise Randomization Approach to Combinatorial Linear Semi-bandits

Combinatorial linear semi-bandits (CLS) are widely applicable frameworks...
research
02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...

Please sign up or login with your details

Forgot password? Click here to reset