Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations

06/12/2021
by   Haike Xu, et al.
0

We consider the stochastic combinatorial semi-bandit problem with adversarial corruptions. We provide a simple combinatorial algorithm that can achieve a regret of Õ(C+d^2K/Δ_min) where C is the total amount of corruptions, d is the maximal number of arms one can play in each round, K is the number of arms. If one selects only one arm in each round, we achieves a regret of Õ(C+∑_Δ_i>0(1/Δ_i)). Our algorithm is combinatorial and improves on the previous combinatorial algorithm by [Gupta et al., COLT2019] (their bound is Õ(KC+∑_Δ_i>0(1/Δ_i))), and almost matches the best known bounds obtained by [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] (up to logarithmic factor). Note that the algorithms in [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] require one to solve complex convex programs while our algorithm is combinatorial, very easy to implement, requires weaker assumptions and has very low oracle complexity and running time. We also study the setting where we only get access to an approximation oracle for the stochastic combinatorial semi-bandit problem. Our algorithm achieves an (approximation) regret bound of Õ(d√(KT)). Our algorithm is very simple, only worse than the best known regret bound by √(d), and has much lower oracle complexity than previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Combinatorial Bandits with Full-Bandit Feedback: Sample Complexity and Regret Minimization

Combinatorial Bandits generalize multi-armed bandits, where k out of n a...
research
04/25/2019

Lipschitz Bandit Optimization with Improved Efficiency

We consider the Lipschitz bandit optimization problem with an emphasis o...
research
05/31/2023

Combinatorial Neural Bandits

We consider a contextual combinatorial bandit problem where in each roun...
research
02/22/2023

When Combinatorial Thompson Sampling meets Approximation Regret

We study the Combinatorial Thompson Sampling policy (CTS) for combinator...
research
02/20/2019

A Note on Bounding Regret of the C^2UCB Contextual Combinatorial Bandit

We revisit the proof by Qin et al. (2014) of bounded regret of the C^2UC...
research
06/09/2022

Individually Fair Learning with One-Sided Feedback

We consider an online learning problem with one-sided feedback, in which...
research
04/14/2022

A Unified Analysis of Dynamic Interactive Learning

In this paper we investigate the problem of learning evolving concepts o...

Please sign up or login with your details

Forgot password? Click here to reset