Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

05/30/2023
by   Renzhe Xu, et al.
0

Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms' rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms' rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Distributed Multi-Player Bandits - a Game of Thrones Approach

We consider a multi-armed bandit game where N players compete for K arms...
research
08/03/2019

Multiplayer Bandit Learning, from Competition to Cooperation

The stochastic multi-armed bandit problem is a classic model illustratin...
research
04/28/2022

Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms Applications

Multi-player multi-armed bandits (MMAB) study how decentralized players ...
research
02/21/2019

Multi-Player Bandits: The Adversarial Case

We consider a setting where multiple players sequentially choose among a...
research
06/19/2020

Gradient-free Online Learning in Games with Delayed Rewards

Motivated by applications to online advertising and recommender systems,...
research
02/10/2021

Player Modeling via Multi-Armed Bandits

This paper focuses on building personalized player models solely from pl...

Please sign up or login with your details

Forgot password? Click here to reset