Conservative Contextual Combinatorial Cascading Bandit

04/17/2021
by   Kun Wang, et al.
5

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel conservative contextual combinatorial cascading bandit (C^4-bandit), a cascading online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the C^4-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. We also improve the bound of the conservative contextual combinatorial bandit as a by-product. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2016

Conservative Contextual Linear Bandits

Safety is a desirable property that can immensely increase the applicabi...
research
08/06/2021

Joint AP Probing and Scheduling: A Contextual Bandit Approach

We consider a set of APs with unknown data rates that cooperatively serv...
research
03/29/2022

Stochastic Conservative Contextual Linear Bandits

Many physical systems have underlying safety considerations that require...
research
02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...
research
12/14/2020

A One-Size-Fits-All Solution to Conservative Bandit Problems

In this paper, we study a family of conservative bandit problems (CBPs) ...
research
11/26/2019

Contextual Combinatorial Conservative Bandits

The problem of multi-armed bandits (MAB) asks to make sequential decisio...
research
05/28/2022

Federated Neural Bandit

Recent works on neural contextual bandit have achieved compelling perfor...

Please sign up or login with your details

Forgot password? Click here to reset