Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

02/22/2019
by   Larkin Liu, et al.
0

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB simulations to simulate an online eCommerce platform for grocery pick up, optimizing for product availability. In this work, we evaluate several popular MAB strategies, such as ϵ-greedy, UCB1, and Thompson Sampling. We compare the respective performances of each MAB strategy in the context of regret minimization. We run the analysis in the scenario where the reward function is non-stationary. Furthermore, the process experiences delayed feedback, where the reward function is not immediately responsive to the arm played. We devise a new Bayesian technique (BAG1) tailored for non-stationary reward functions in the delayed feedback scenario. The results of the simulation show show superior performance in the context of regret minimization compared to traditional MAB strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2022

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where...
research
11/27/2022

Rectified Pessimistic-Optimistic Learning for Stochastic Continuum-armed Bandit with Constraints

This paper studies the problem of stochastic continuum-armed bandit with...
research
05/24/2023

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...
research
06/21/2021

Smooth Sequential Optimisation with Delayed Feedback

Stochastic delays in feedback lead to unstable sequential learning using...
research
04/22/2020

Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

In evolutionary computation, different reproduction operators have vario...
research
08/02/2023

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

This paper discusses the system architecture design and deployment of no...
research
12/27/2017

Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling

We consider the problem of Active Search, where a maximum of relevant ob...

Please sign up or login with your details

Forgot password? Click here to reset