Bandits with Delayed Anonymous Feedback

09/20/2017
by   Ciara Pike-Burke, et al.
0

We study the bandits with delayed anonymous feedback problem, a variant of the stochastic K-armed bandit problem, in which the reward from each play of an arm is no longer obtained instantaneously but received after some stochastic delay. Furthermore, the learner is not told which arm an observation corresponds to, nor do they observe the delay associated with a play. Instead, at each time step, the learner selects an arm to play and receives a reward which could be from any combination of past plays. This is a very natural problem; however, due to the delay and anonymity of the observations, it is considerably harder than the standard bandit problem. Despite this, we demonstrate it is still possible to achieve logarithmic regret, but with additional lower order terms. In particular, we provide an algorithm with regret O((T) + √(g(τ) (T)) + g(τ)) where g(τ) is some function of the delay distribution. This is of the same order as that achieved in Joulani et al. (2013) for the simpler problem where the observations are not anonymous. We support our theoretical observation equating the two orders of regret with experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit...
research
10/10/2018

Decentralized Cooperative Stochastic Multi-armed Bandits

We study a decentralized cooperative stochastic multi-armed bandit probl...
research
02/01/2022

Regret Minimization with Performative Feedback

In performative prediction, the deployment of a predictive model trigger...
research
02/09/2016

Compliance-Aware Bandits

Motivated by clinical trials, we study bandits with observable non-compl...
research
10/31/2019

Recovering Bandits

We study the recovering bandits problem, a variant of the stochastic mul...
research
07/21/2022

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for ...
research
02/01/2023

Delayed Feedback in Kernel Bandits

Black box optimisation of an unknown function from expensive and noisy e...

Please sign up or login with your details

Forgot password? Click here to reset