Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

02/17/2020
by   Thibaut Cuvelier, et al.
0

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (ln m)^2 (ln T) Δ_min), but it has computational complexity O(| X|) which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret R(T) = O(d (ln m)^2 (ln T)Δ_min) and computational complexity O(T poly(d)). Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time O(T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/28/2014

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
02/14/2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...
03/04/2015

Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints

We study online prediction where regret of the algorithm is measured aga...
02/11/2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...
07/31/2021

Pure Exploration and Regret Minimization in Matching Bandits

Finding an optimal matching in a weighted graph is a standard combinator...
02/10/2021

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....
10/05/2020

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimizatio...