Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

by   Thibaut Cuvelier, et al.

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (ln m)^2 (ln T) Δ_min), but it has computational complexity O(| X|) which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret R(T) = O(d (ln m)^2 (ln T)Δ_min) and computational complexity O(T poly(d)). Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time O(T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.


page 1

page 2

page 3

page 4


Efficient Learning in Large-Scale Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...

Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints

We study online prediction where regret of the algorithm is measured aga...

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...

Pure Exploration and Regret Minimization in Matching Bandits

Finding an optimal matching in a weighted graph is a standard combinator...

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimizatio...