Delayed Feedback in Generalised Linear Bandits Revisited

07/21/2022
by   Benjamin Howson, et al.
0

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, in many real world settings, the requirement that the reward is observed immediately is not applicable. In this setting, standard algorithms are no longer theoretically understood. We study the phenomenon of delayed rewards in a theoretical manner by introducing a delay between selecting an action and receiving the reward. Subsequently, we show that an algorithm based on the optimistic principle improves on existing approaches for this setting by eliminating the need for prior knowledge of the delay distribution and relaxing assumptions on the decision set and the delays. This also leads to improving the regret guarantees from O(√(dT)√(d + 𝔼[τ])) to O(d√(T) + d^3/2𝔼[τ]), where 𝔼[τ] denotes the expected delay, d is the dimension and T the time horizon and we have suppressed logarithmic terms. We verify our theoretical results through experiments on simulated data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random del...
research
05/15/2023

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

We derive a new analysis of Follow The Regularized Leader (FTRL) for onl...
research
02/20/2015

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...
research
09/20/2017

Bandits with Delayed Anonymous Feedback

We study the bandits with delayed anonymous feedback problem, a variant ...
research
08/21/2023

An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

We propose a new best-of-both-worlds algorithm for bandits with variably...
research
11/24/2021

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

The cooperative bandit problem is increasingly becoming relevant due to ...
research
12/22/2022

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/rewar...

Please sign up or login with your details

Forgot password? Click here to reset