Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

06/11/2020
by   Pierre Perrault, et al.
0

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in [0,1], we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson Sampling (TS) methodology to th...
research
10/27/2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration

In this paper we propose the first multi-armed bandit algorithm based on...
research
03/10/2023

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

In this paper we propose a general methodology to derive regret bounds f...
research
03/05/2017

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

We study combinatorial multi-armed bandit with probabilistically trigger...
research
01/17/2023

A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles

In this work, we address the problem of long-distance navigation for bat...
research
07/12/2013

Thompson Sampling for 1-Dimensional Exponential Family Bandits

Thompson Sampling has been demonstrated in many complex bandit models, h...
research
02/02/2019

First-Order Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior...

Please sign up or login with your details

Forgot password? Click here to reset