Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

12/12/2022
by   Guojun Xiong, et al.
0

Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have full access to all arms and receive no reward when pulling the same arm. Hence all players solve the same bandit problem with the goal of maximizing their cumulative reward. However, these settings neglect several important factors in many real-world applications, where players have limited access to a dynamic local subset of arms (i.e., an arm could sometimes be “walking” and not accessible to the player). To this end, this paper proposes a multi-player multi-armed walking bandits model, aiming to address aforementioned modeling issues. The goal now is to maximize the reward, however, players can only pull arms from the local subset and only collect a full reward if no other players pull the same arm. We adopt Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff and employ distributed optimization techniques to properly handle collisions. By carefully integrating these two techniques, we propose a decentralized algorithm with near-optimal guarantee on the regret, and can be easily implemented to obtain competitive empirical performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Game of Thrones: Fully Distributed Learning for Multi-Player Bandits

We consider a multi-armed bandit game where N players compete for M arms...
research
11/05/2018

Multi-armed Bandits with Compensation

We propose and study the known-compensation multi-arm bandit (KCMAB) pro...
research
08/28/2023

Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages

The multi-armed bandit (MAB) problem is a classical problem that models ...
research
01/15/2019

Combinatorial Sleeping Bandits with Fairness Constraints

The multi-armed bandit (MAB) model has been widely adopted for studying ...
research
08/11/2022

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of Multi-armed Bandits

The multi-armed bandit (MAB) model is one of the most classical models t...
research
02/10/2021

Player Modeling via Multi-Armed Bandits

This paper focuses on building personalized player models solely from pl...
research
02/23/2020

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Consider N cooperative but non-communicating players where each plays on...

Please sign up or login with your details

Forgot password? Click here to reset