Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

04/27/2022
by   Zongqi Wan, et al.
0

We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into d components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest d rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs Ω(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K-armed bandit and bandit convex optimization, we have 𝒪(T^2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in <cit.>, showing that non-oblivious delay is enough to incur Ω̃(T^2/3) regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2021

Nonstochastic Bandits with Composite Anonymous Feedback

We investigate a nonstochastic bandit setting in which the loss of an ac...
research
10/26/2021

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem ...
research
05/30/2019

Equipping Experts/Bandits with Long-term Memory

We propose the first reduction-based approach to obtaining long-term mem...
research
05/30/2023

Delayed Bandits: When Do Intermediate Observations Help?

We study a K-armed bandit with delayed feedback and intermediate observa...
research
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
research
05/16/2019

Adaptive Sensor Placement for Continuous Spaces

We consider the problem of adaptively placing sensors along an interval ...
research
10/01/2020

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-...

Please sign up or login with your details

Forgot password? Click here to reset