Efficient Learning in Large-Scale Combinatorial Semi-Bandits

06/28/2014
by   Zheng Wen, et al.
0

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2014

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
02/17/2020

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}^d wher...
research
01/31/2023

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Motivated by concerns about making online decisions that incur undue amo...
research
09/17/2021

Online Learning of Network Bottlenecks via Minimax Paths

In this paper, we study bottleneck identification in networks via extrac...
research
03/06/2020

Optimizing Revenue while showing Relevant Assortments at Scale

Scalable real-time assortment optimization has become essential in e-com...
research
05/21/2016

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

We study the stochastic online problem of learning to influence in a soc...
research
06/03/2018

Conservative Exploration using Interleaving

In many practical problems, a learning agent may want to learn the best ...

Please sign up or login with your details

Forgot password? Click here to reset