Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

03/05/2017
by   Qinshi Wang, et al.
0

We study combinatorial multi-armed bandit with probabilistically triggered arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior CMAB-T studies where the regret bounds contain a possibly exponentially large factor of 1/p*, where p* is the minimum positive probability that an arm is triggered by any action. We address this issue by introducing a triggering probability modulated (TPM) bounded smoothness condition into the general CMAB-T framework, and show that many applications such as influence maximization bandit and combinatorial cascading bandit satisfy this TPM condition. As a result, we completely remove the factor of 1/p^* from the regret bounds, achieving significantly better regret bounds for influence maximization and cascading bandits than before. Finally, we provide lower bound results showing that the factor 1/p* is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

In this paper, we study the combinatorial semi-bandits (CMAB) and focus ...
research
05/21/2016

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

We study the stochastic online problem of learning to influence in a soc...
research
02/27/2015

Influence Maximization with Bandits

We consider the problem of influence maximization, the problem of maximi...
research
05/08/2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

We consider the combinatorial multi-armed bandit (CMAB) problem, where t...
research
02/19/2020

Warm Starting Bandits with Side Information from Confounded Data

We study a variant of the multi-armed bandit problem where side informat...
research
06/11/2020

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

We investigate stochastic combinatorial multi-armed bandit with semi-ban...
research
02/02/2019

First-Order Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior...

Please sign up or login with your details

Forgot password? Click here to reset