Delayed Bandit Online Learning with Unknown Delays

07/09/2018
by   Bingcong Li, et al.
2

This paper studies bandit learning problems with delayed feedback, which included multi-armed bandit (MAB) and bandit convex optimization (BCO). Given only function value information (a.k.a. bandit feedback), algorithms for both MAB and BCO typically rely on (possibly randomized) gradient estimators based on function values, and then feed them into well-studied gradient-based algorithms. Different from existing works however, the setting considered here is more challenging, where the bandit feedback is not only delayed but also the presence of its delay is not revealed to the learner. Existing algorithms for delayed MAB and BCO become intractable in this setting. To tackle such challenging settings, DEXP3 and DBGD have been developed for MAB and BCO, respectively. Leveraging a unified analysis framework, it is established that both DEXP3 and DBGD guarantee an O( √(T+D)) regret over T time slots with D being the overall delay accumulated over slots. The new regret bounds match those in full information settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Banker Online Mirror Descent

We propose Banker-OMD, a novel framework generalizing the classical Onli...
research
09/22/2016

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Algorithms for bandit convex optimization and online learning often rely...
research
05/28/2017

Bayesian Unification of Gradient and Bandit-based Learning for Accelerated Global Optimisation

Bandit based optimisation has a remarkable advantage over gradient based...
research
05/15/2022

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Motivated by applications to online learning in sparse estimation and Ba...
research
02/24/2022

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-a...
research
06/04/2013

Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention ...
research
01/23/2019

Cooperation Speeds Surfing: Use Co-Bandit!

In this paper, we explore the benefit of cooperation in adversarial band...

Please sign up or login with your details

Forgot password? Click here to reset