Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

02/03/2019
by   Sakshi Arya, et al.
0

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2023

Multi armed bandits and quantum channel oracles

Multi armed bandits are one of the theoretical pillars of reinforcement ...
research
05/26/2020

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

Delayed rewards problem in contextual bandits has been of interest in va...
research
03/21/2022

Efficient Algorithms for Extreme Bandits

In this paper, we contribute to the Extreme Bandit problem, a variant of...
research
09/16/2022

Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn

Training models on data obtained from randomized experiments is ideal fo...
research
02/26/2020

Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions

For sponsored search auctions, we consider contextual multi-armed bandit...
research
12/09/2022

Networked Restless Bandits with Positive Externalities

Restless multi-armed bandits are often used to model budget-constrained ...
research
01/03/2023

Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Designing experiments often requires balancing between learning about th...

Please sign up or login with your details

Forgot password? Click here to reset