Delay-Adaptive Learning in Generalized Linear Contextual Bandits

03/11/2020
by   Jose Blanchet, et al.
0

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision-maker only after some delay, which is unknown and stochastic. We study the performance of two well-known algorithms adapted to this delayed setting: one based on upper confidence bounds, and the other based on Thompson sampling. We describe modifications on how these two algorithms should be adapted to handle delays and give regret characterizations for both algorithms. Our results contribute to the broad landscape of contextual bandits literature by establishing that both algorithms can be made to be robust to delays, thereby helping clarify and reaffirm the empirical success of these two algorithms, which are widely deployed in modern recommendation engines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recomm...
research
06/07/2021

On Learning to Rank Long Sequences with Contextual Bandits

Motivated by problems of learning to rank long item sequences, we introd...
research
04/26/2023

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

In this work, we study the performance of the Thompson Sampling algorith...
research
04/27/2015

Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

We study contextual bandits with budget and time constraints, referred t...
research
05/26/2020

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in va...
research
10/21/2022

Anonymous Bandits for Multi-User Systems

In this work, we present and study a new framework for online learning i...
research
10/16/2012

Leveraging Side Observations in Stochastic Bandits

This paper considers stochastic bandits with side observations, a model ...

Please sign up or login with your details

Forgot password? Click here to reset