Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

02/17/2020
by   Thibaut Cuvelier, et al.
0

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (ln m)^2 (ln T) Δ_min), but it has computational complexity O(| X|) which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret R(T) = O(d (ln m)^2 (ln T)Δ_min) and computational complexity O(T poly(d)). Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time O(T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2014

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
02/14/2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...
research
03/04/2015

Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints

We study online prediction where regret of the algorithm is measured aga...
research
02/11/2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...
research
08/31/2022

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

In this paper, we study the combinatorial semi-bandits (CMAB) and focus ...
research
07/31/2021

Pure Exploration and Regret Minimization in Matching Bandits

Finding an optimal matching in a weighted graph is a standard combinator...
research
10/05/2020

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimizatio...

Please sign up or login with your details

Forgot password? Click here to reset