Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

02/11/2019
by   Pierre Perrault, et al.
0

We improve the efficiency of algorithms for stochastic combinatorial semi-bandits. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as independence. However, while being minimax optimal in terms of regret, these algorithms are intractable. In our paper, we first reduce their implementation to a specific submodular maximization. Then, in case of matroid constraints, we design adapted approximation routines, thereby providing the first efficient algorithms that exploit the reward structure. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor √(k), where k is the maximum action size. Finally, we show how our improvement translates to more general budgeted combinatorial semi-bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2014

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
02/14/2021

Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

We consider combinatorial semi-bandits with uncorrelated Gaussian reward...
research
06/03/2018

Conservative Exploration using Interleaving

In many practical problems, a learning agent may want to learn the best ...
research
02/17/2020

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}^d wher...
research
03/20/2014

Matroid Bandits: Fast Combinatorial Optimization with Learning

A matroid is a notion of independence in combinatorial optimization whic...
research
10/08/2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally propose...
research
01/30/2021

Recurrent Submodular Welfare and Matroid Blocking Bandits

A recent line of research focuses on the study of the stochastic multi-a...

Please sign up or login with your details

Forgot password? Click here to reset