Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

10/03/2014
by   Branislav Kveton, et al.
0

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / Δ) n) and O(√(K L n n)) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and Δ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2014

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
02/11/2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...
research
01/31/2023

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Motivated by concerns about making online decisions that incur undue amo...
research
10/02/2018

Thompson Sampling for Cascading Bandits

We design and analyze TS-Cascade, a Thompson sampling algorithm for the ...
research
05/30/2014

Learning to Act Greedily: Polymatroid Semi-Bandits

Many important optimization problems, such as the minimum spanning tree ...
research
03/23/2022

Minimax Regret for Cascading Bandits

Cascading bandits model the task of learning to rank K out of L items ov...
research
03/20/2014

Matroid Bandits: Fast Combinatorial Optimization with Learning

A matroid is a notion of independence in combinatorial optimization whic...

Please sign up or login with your details

Forgot password? Click here to reset