On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

12/12/2018
by   Lai Wei, et al.
0

We study the multi-player stochastic multiarmed bandit (MAB) problem in an abruptly changing environment. We consider a collision model in which a player receives reward at an arm if it is the only player to select the arm. We design two novel algorithms, namely, Round-Robin Sliding-Window Upper Confidence Bound# (RR-SW-UCB#), and the Sliding-Window Distributed Learning with Prioritization (SW-DLP). We rigorously analyze these algorithms and show that the expected cumulative group regret for these algorithms is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...
research
11/25/2021

Bandit problems with fidelity rewards

The fidelity bandits problem is a variant of the K-armed bandit problem ...
research
11/21/2019

Observe Before Play: Multi-armed Bandit with Pre-observations

We consider the stochastic multi-armed bandit (MAB) problem in a setting...
research
09/06/2022

Multi-Armed Bandits with Self-Information Rewards

This paper introduces the informational multi-armed bandit (IMAB) model ...
research
05/08/2018

Multinomial Logit Bandit with Linear Utility Functions

Multinomial logit bandit is a sequential subset selection problem which ...
research
01/30/2023

Adversarial Attacks on Adversarial Bandits

We study a security threat to adversarial multi-armed bandits, in which ...
research
02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...

Please sign up or login with your details

Forgot password? Click here to reset