Decentralized Learning Dynamics in the Gossip Model

06/14/2023
by   John Lazarsfeld, et al.
0

We study a distributed multi-armed bandit setting among a population of n memory-constrained nodes in the gossip model: at each round, every node locally adopts one of m arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weight update algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of the population and number of arms) for both the stationary reward setting (where the mean of each arm's distribution is fixed over time) and the adversarial reward setting (where means can vary over time). Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2021

Top K Ranking for Multi-Armed Bandit with Noisy Evaluations

We consider a multi-armed bandit setting where, at the beginning of each...
research
01/18/2023

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by “...
research
06/27/2017

Multi-armed Bandit Problems with Strategic Arms

We study a strategic version of the multi-armed bandit problem, where ea...
research
06/19/2019

Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules

We consider a class of restless multi-armed bandit (RMAB) problems with ...
research
11/13/2020

Rebounding Bandits for Modeling Satiation Effects

Psychological research shows that enjoyment of many goods is subject to ...
research
07/14/2023

Repeated Game Dynamics in Population Protocols

We initiate the study of repeated game dynamics in the population model,...
research
11/17/2016

Unimodal Thompson Sampling for Graph-Structured Arms

We study, to the best of our knowledge, the first Bayesian algorithm for...

Please sign up or login with your details

Forgot password? Click here to reset