Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

07/24/2017
by   A. Ömer Sarıtaç, et al.
0

In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate κ (CUCB-κ), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret. In addition, we prove that CUCB-0 and CTS incur O(√(T)) gap-independent regret. These results improve the results in previous works, which show O( T) gap-dependent and O(√(T T)) gap-independent regrets, respectively, under no assumptions on the ATPs. Then, we numerically evaluate the performance of CUCB-κ and CTS in a real-world movie recommendation problem, where the actions correspond to recommending a set of movies, the arms correspond to the edges between the movies and the users, and the goal is to maximize the total number of users that are attracted by at least one movie. Our numerical results complement our theoretical findings on bounded regret. Apart from this problem, our results also directly apply to the online influence maximization (OIM) problem studied in numerous prior works.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

We analyze the regret of combinatorial Thompson sampling (CTS) for the c...
research
02/22/2017

Approximations of the Restless Bandit Problem

The multi-armed restless bandit problem is studied in the case where the...
research
05/14/2020

Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

We study the combinatorial sleeping multi-armed semi-bandit problem with...
research
09/17/2021

Online Learning of Network Bottlenecks via Minimax Paths

In this paper, we study bottleneck identification in networks via extrac...
research
02/22/2023

When Combinatorial Thompson Sampling meets Approximation Regret

We study the Combinatorial Thompson Sampling policy (CTS) for combinator...
research
07/27/2023

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

This paper presents an efficient algorithm to solve the sleeping bandit ...
research
10/20/2016

Combinatorial Multi-Armed Bandit with General Reward Functions

In this paper, we study the stochastic combinatorial multi-armed bandit ...

Please sign up or login with your details

Forgot password? Click here to reset