Nonstochastic Bandits and Experts with Arm-Dependent Delays

11/02/2021
by   Dirk van der Hoeven, et al.
0

We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. These are the first bounds in the delayed setting that depend on the losses and delays of the best arm only. When in the bandit setting no information other than the losses is observed, we still manage to prove a regret bound through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...
research
06/18/2020

Stochastic bandits with arm-dependent delays

Significant work has been recently dedicated to the stochastic delayed b...
research
10/01/2020

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-...
research
10/12/2020

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

We consider the adversarial multi-armed bandit problem under delayed fee...
research
06/03/2019

Nonstochastic Multiarmed Bandits with Unrestricted Delays

We investigate multiarmed bandits with delayed feedback, where the delay...
research
07/24/2020

Exploiting the Surrogate Gap in Online Multiclass Classification

We present Gaptron, a randomized first-order algorithm for online multic...
research
09/30/2021

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Although the classical version of the Multi-Armed Bandits (MAB) framewor...

Please sign up or login with your details

Forgot password? Click here to reset