Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

10/26/2021
โˆ™
by   Jiatai Huang, et al.
โˆ™
0
โˆ™

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are [0,1]-bounded, in our setting, losses can fall in a general bounded interval [-L, L], unknown to the agent before-hand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose an algorithm named for this novel setting, which combines a recent banker online mirror descent technique and elaborately designed doubling tricks. We show that achieves ๐’ช(โˆš(K(D+T))L)ยท polylog(T, L) total regret, where T is the total number of steps and D is the total feedback delay. also outperforms existing algorithm for non-delayed (i.e., D=0) scale-free adversarial MAB problem instances. We also present a variant of for problem instances with non-negative losses (i.e., they range in [0, L] for some unknown L), achieving an ๐’ชฬƒ(โˆš(K(D+T))L) total regret, which is near-optimal compared to the ฮฉ(โˆš(KT)+โˆš(Dlog K)L) lower-bound ([Cesa-Bianchi et al., 2016]).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
โˆ™ 04/27/2022

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

We study the adversarial bandit problem with composite anonymous delayed...
research
โˆ™ 06/08/2021

Scale Free Adversarial Multi Armed Bandits

We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, ...
research
โˆ™ 10/01/2020

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-...
research
โˆ™ 02/24/2022

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-a...
research
โˆ™ 06/04/2021

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random del...
research
โˆ™ 05/30/2023

Delayed Bandits: When Do Intermediate Observations Help?

We study a K-armed bandit with delayed feedback and intermediate observa...
research
โˆ™ 06/10/2015

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

We consider a contextual version of multi-armed bandit problem with glob...

Please sign up or login with your details

Forgot password? Click here to reset